# Full text of "Opera Magistris (Elements of Applied Mathematics)"

## See other formats

arth 5 distrib /y/.( kS5» '' . '’//A 6,000 pages super-quick, super-painless undergraduate transportable Book nit ary 'Applied Mathematics for Engineers (EAME) Dera m IHcfT r j f J miw'jr *1 F j ... ; ■ > V| MS I//'" Earth 6 Opera Magistris 3rd Edition Compendium on Elementary Applied Mathematics for Engineers Co-editors: God Supervisors : Leon HARMEL F.D.C. Tigrou Vincent ISOZ 2017-03-29 Revision 3.5 Revision History Date 2017-04-01 Author(s) Description VI French 3rd Edition translation into English (6,000 first pages freely available!). Translation progress ~ 96% iii Contents 1 Warnings 2 2 Acknowledgements 21 3 Introduction 23 4 Arithmetic 52 5 Algebra 767 6 Analysis 1354 7 Geometry 1517 8 Mechanics 1964 9 Electromagnetism 2584 10 Atomistic 2906 11 Cosmology 3216 12 Chemistry 3593 13 Theoretical Computing 3675 14 Social Sciences 4137 15 Engineering 4809 16 Epilogue 5439 17 Biographies 5441 18 Chronology 5534 V 19 Humour 5565 20 Links 5645 21 Quotes 5655 22 Change Log 5662 23 Nomenclature 5680 List of Figures 5684 List of Tables 5719 List of Algorithms 5725 Bibliography 5727 Index 5733 24 Donate 5783 vi Table of Contents 1 Warnings 2 1 Impressum 3 1.1 Use of content 3 1.2 How to use this book 4 1.2.1 Ancilliaries 4 1 .3 Data Protection 7 1 .4 Use of data 7 1 .5 Data transmission 7 1.6 Agreement 7 1.7 Errata 7 2 License 8 2.1 Preamble 8 2.2 Applicability and Definitions 9 2.3 Verbatim Copying 10 2.4 Copying in Quantity 10 2.5 Modifications 11 2.6 Combining Documents 12 2.7 Collections of Documents 13 2.8 Aggregation with independant Works 13 2.9 Translation 13 2.10 Termination 13 2.11 Future revisions of this License 14 3 Roadmap 15 2 Acknowledgements 21 3 Introduction 23 1 Forewords 24 2 Methods 31 2.1 Descartes’ Method 34 2.2 Archimedean Oath 35 2.3 Scientific Publication Rules (SPR) 36 2.4 Scientific Mainstream Media communication 38 3 Vocabulary 39 3.1 On Sciences 40 3.2 Terminology 43 4 Science and Faith 46 4.0.1 Baloney detection kit 48 vii 4 Arithmetic 52 1 Proof Theory 54 1.0.1 Foundations Crisis 55 1.1 Paradoxes 59 1.1.1 Hypothetical-Deductive Reasoning 61 1 .2 Propositional Calculus 62 1.2.1 Propositions (premises) 63 1.2.2 Connectors 65 1.2.3 Decision procedures 71 1.2. 3.1 Non- axiomatic procedural decisions 72 1.2. 3. 2 Axiomatic procedural decisions 72 1.2.4 Quantifiers 77 1.3 Predicate Calculus 79 1.3.1 Grammar 79 1.3.2 Language 80 1.3. 2.1 Symbols 80 1.3. 2. 2 Terms 81 1.3. 2. 3 Formulas 83 1.4 Proofs 86 1.4.1 Rules of Proofs 88 2 Numbers 97 2.1 Digital Bases 100 2.2 Type of Numbers 103 2.2.1 Natural Integer Numbers 103 2.2. 1.1 Peano axioms 105 2. 2. 1.2 Odd, Even and Perfect Numbers 106 2. 2. 1.3 Prime Numbers 106 2.2.2 Relative Integer Numbers 108 2.2.3 Rational Numbers 109 2.2.4 Irrational Numbers 113 2.2.5 Real Numbers 115 2.2.6 Trans finite Numbers 117 2.2.7 Complex Numbers 120 2.2.7. 1 Geometric Interpretation of Complex Numbers . . . 125 2.2.7. 1.1 Fresnel Vectors (phasors) 129 2. 2.7. 2 Transformation in the plane 130 2.2.8 Quaternion Numbers 135 2.2.8. 1 Matrix Interpretation of Quaternions 141 2. 2. 8. 2 Rotations with Quaternions 142 2.2.9 Algebraic and Transcendental Numbers 148 2.2.10 Universe Numbers (normal numbers) 151 2.2.11 Abstract Numbers (variables) 152 2.2.11.1 Domain of a Variable 153 3 Arithmetic Operators 156 3.1 Binary Relations 156 3.1.1 Equalities 157 3.1.2 Comparators 158 3.2 Fundamental Arithmetic Laws 167 viii 3.2.1 Addition 167 3.2.2 Subtraction 171 3.2.3 Multiplication 174 3.2.4 Division 177 3.2.4. 1 n-root 181 3.3 Arithmetic Polynomials 183 3.4 Absolute Value 184 3.5 Calculation Rules (operators priorities) 187 4 Number Theory 192 4.1 Principle of good order 192 4.2 Induction Principle 193 4.3 Divisibility 195 4.3.1 Euclidean Division 197 4.3. 1.1 Greatest common divisor 199 4.3.2 Euclidean Algorithm 202 4.3.3 Least Common Multiple 206 4.3.4 Fundamental Theorem of Arithmetic 210 4.3.5 Congruences (modular arithmetic) 211 4.3.5. 1 Congruence Class 214 4. 3. 5. 2 Complete set of residues 217 4. 3. 5. 3 Chinese remainder theorem 218 4.3.6 Continued fraction 222 5 Set Theory 231 5.1 Zermelo-Fraenkel Axiomatic 235 5.1.1 Cardinals 240 5.1.2 Cartesian Product 243 5.1.3 Intervals 244 5.2 Set Operations 245 5.2.1 Inclusion 246 5.2.2 Intersection 246 5.2.3 Union 248 5.2.4 Difference 249 5.2.5 Symmetric Difference 250 5.2.6 Product 251 5.2.7 Complementarity 251 5.3 Functions and Applications 253 5.3.1 Cantor-Bemstein Theorem 260 6 Probabilities 269 6.1 Event Universe 270 6.2 Kolmogorov’s Axioms 271 6.3 Conditional Probabilities 277 6.3.1 Conditional Expectation 283 6.3.2 Bayesian Networks 285 6.4 Martingales 297 6.5 Combinatorial Analysis 299 6.5.1 Simple Arrangements with Repetition 300 6.5.2 Simple Permutations without Repetitions 301 6.5.3 Simple Permutations with Repetitions 302 IX 6.5.4 Simple Arrangements with Repetitions 303 6.5.5 Simple Combinations without Repetitions 304 6.5.6 Simple Combinations with Repetitions 306 6.6 Markov Chains 308 7 Statistics 313 7.1 Samples 315 7.2 Averages 316 7.2.1 Laplace Smoothing 333 7.2.2 Means and Averages properties 334 7.3 Type of variables 339 7.3.1 Discrete Variables and Moments 340 7. 3. 1.1 Mean and Deviation of Discrete Random Variables . 341 7. 3. 1.2 Discrete Covariance 351 7. 3. 1.2.1 Anscombe’s famous quartet 356 7. 3. 1.3 Mean and Variance of the Average 358 7. 3. 1.4 Coefficient of Correlation 360 7.3.2 Continuous Variables and Moments 365 7.4 Fundamental postulate of statistics 366 7.5 Diversity Index 367 7.6 Distribution Functions (probabilities laws) 369 7.6.1 Discrete Uniform Distribution 371 7.6.2 Bernoulli Distribution 374 7.6.3 Geometric Distribution 375 7.6.4 Binomial Distribution 379 7.6.5 Negative Binomial Distribution 386 7.6.6 Hypergeometric Distribution 391 7.6.7 Multinomial Distribution 397 7.6.8 Poisson Distribution 401 7.6.9 Normal & Gauss-Laplace Distribution 404 7.6.9. 1 Sum of two random Normal variables 412 7. 6. 9. 2 Product of two random Normal variables 414 7. 6. 9. 3 Bivariate Normal Distribution 416 7. 6. 9.4 Normal Reduced Centered Distribution 421 7. 6. 9. 5 Henry’s Line 422 1 . 6 . 9.6 Q-Q plot 426 7.6.10 Log-Normal Distribution 428 7.6.11 Continuous Uniform Distribution 431 7.6.12 Triangular Distribution 435 7.6.13 Pareto Distribution 437 7.6.14 Exponential Distribution 443 7.6.15 Cauchy Distribution 445 7.6.16 Beta Distribution 448 7.6.17 Gamma Distribution 451 7.6.18 Generalized Gamma Distribution 456 7.6.19 Chi-Square (Pearson) Distribution 457 7.6.20 Student Distribution 461 7.6.21 Fisher Distribution 465 7.6.22 General Folded Normal Distribution 467 x 7.6.22.1 Half-normal distribution 469 7.6.23 Benford Distribution 473 7.7 Likelihood Estimators 476 7.7.1 Normal Distribution MLE 478 7.7.2 Poisson Distribution MLE 482 7.7.3 Binomial (and Geometric) Distribution MLE 483 7.7.4 Weibull Distribution MLE 484 7.7.5 Gamma Distribution MLE 487 7.8 Finite Population Correction Factor 488 7.9 Confidence Intervals 491 7.9.1 C.I. on the Mean with known Variance 492 7.9.2 C.I. on the Variance with known Mean 497 7.9.3 C.I. on the Variance with empirical Mean 502 7.9.4 C.I. on the Mean with known unbiased Variance 504 7.9.5 Binomial exact Test 507 7.9.6 C.I. for a Proportion 510 7.9.6. 1 Test of equality of two Proportions 514 7.9.7 Sign Test 516 7.9.8 Mood’s Median Test 518 7.9.9 Poisson Test (1 sample) 520 7.9.10 Poisson Test (2 samples) 523 7.9.11 Confidence/Tolerance/Prediction Interval 526 7.10 Weak Law of Large Numbers 528 7.11 Characteristic Function 532 7.12 Central Limit Theorem 536 7.13 Univariate Hypothesis and Adequation tests 542 7.13.1 Direction of hypothesis test and p - values 545 7.13.2 Fisher’s method for multiple p-values 552 7.13.2.1 Simpson’s Paradox (sophism) 555 7.13.3 Power of a test 557 7.13.4 Power of the one sample Z-test 559 7.13.5 Power of the one and two samples P-test 561 7.13.6 Fieller’s test (ratio of two means) 563 7.13.7 Analysis Of VAriance (ANOVA) 568 7.13.7.1 Analysis of Variance with one fixed factor 569 7.13.7.1.1 Contrasts 581 7.13.7.2 Analysis of Variance with two fixed factors without repetitions 584 7.13.7.3 Analysis of Variance with two fixed factors with rep- etitions 600 7.13.7.4 Multifactor ANOVA with Repeated measures .... 606 7.13.7.5 Latin Square ANOVA 606 7.13.7.6 Greaco-Latin Square ANOVA 610 7.13.7.7 Multivariate Analysis of Variance (MANOVA) . . . 617 7.13.8 Equivalence tests 622 7.13.9 Cochran C test 624 7.13.10 Adequation Tests (goodness of fit tests) 626 7.13.10.1 Pearson’s chi-squared GoF test 626 xr 7.13.10.2 Kolmogorov-Smirnov GoF test 632 7.13.10.3 Ryan-Joiner GoF test 638 7.13.10.4 Anderson-Darling GoF test 642 7.13.11 Likelihood-ratio tests 652 7.14 Robustness 654 7.14.1 Rank Statistics 656 7.14.1.1 L-Statistics 656 7.14.1.2 Ranks Distribution Law 657 7.14.1.3 Wilcoxon Rank Sum Test 658 7.14.1.4 Mann- Witheny Rank Sum Test 667 7.14.1.5 Treatment of equalities 673 7.14.1.6 One sample Wilcoxon rank sum signed test 675 7.14.1.7 Wilcoxon rank sum signed test for two paired samples679 7.14.1.8 Kruskal-Wallis test 681 7.14.1.9 Friedman Test 685 7.14.1.10 Spearman Rank Correlation Coefficient 688 7.14.2 Range Statistics 691 7.14.2.1 Tukey’s Range Test 694 7.14.3 Extreme Value Theory 698 7.15 Multivariate Statistics 699 7.15.1 Principal Component Analysis 700 7.15.1.1 SVD and PCA 716 7.15.2 Correspondence Factorial Analysis (AFC) 718 7.15.3 Chi-2 Test of Independence 723 7.15.4 Cramer’s V 727 7.15.5 Exact Fisher Test 731 7.15.6 Cohen’s kappa agreement 736 7.15.7 McNemar’s test 739 7.16 Survival Statistics 742 7.16.1 Kaplan-Meier Survival Rate 743 7.16.2 Cochran-Mantel-Haenszel tests 748 7.17 Propagation of Errors (experimental uncertainty analysis) 757 7.17.1 Absolute and Relative Uncertainties (Direct calculation of bias) 757 7.17.2 Statistical Errors 758 7.17.3 Repeatability 760 7.17.4 Error propagation (linearized approximation) 761 7.17.5 Significant Numbers 762 7.18 A World without statistics 764 5 Algebra 767 1 Calculus 769 1.1 Equations and Inequations 770 1.1.1 Equations 771 1.1.2 Inequations 775 1.2 Remarkable Identities 779 1.3 Polynomials 783 1.3.1 Euclidean Division of Polynomials 787 1.3.2 Factorization Theorem of Polynomials 789 xii 1.3.3 Diophantine equation 790 1.3.4 First order univariate Polynomial and Equations 791 1.3.5 Second order univariate Polynomial and Equations 792 1.3. 5.1 Irrational Equations 796 1.3. 5. 2 Gold Number 798 1.3.6 Third order univariate Polynomial and Equations 798 1.3.7 Fourth order univariate Polynomial and Equations 801 1.3.8 Trigonometric Polynomials 804 1.3.9 Cyclotomic Polynomials 804 1.3.10 Legendre Polynomials 807 2 Set Algebra 812 2.1 Groups Algebra and Geometry 812 2.1.1 Cyclic Groups 813 2.1.2 Transformations Groups 816 2.1.3 Group of Symetries 824 2. 1.3.1 Orbits and Stabilizers 828 2.1.4 Permutations Groups 829 2.2 Galois Theory 839 2.2.1 Elementary symmetric and Invariant Polynomials 839 2.2.2 General Vieta’s formulas 843 3 Differential and Integral Calculus 844 3.1 Differential Calculus 844 3.1.1 Differentials 854 3.1.2 Usual Derivatives 861 3.1.3 Implicit Differentiation 874 3.1.4 Smoothness 880 3.2 Integral Calculus 881 3.2.1 Definite Integral 881 3.2.2 Indefinite Integral 889 3.2.3 Double Integral 896 3.2.3. 1 Fubini’s theorem 898 3.2.4 Integration by Substitution 899 3.2.4. 1 Jacobian 901 3.2.5 Integration by Parts 906 3.2.6 Usual Primitives 908 3.2.7 Integral representation of first kind Bessel’s function 929 3.2.8 Dirac Function 933 3.2.9 Gamma Euler Function 935 3.2.9. 1 Euler-Mascheroni Constant 939 3.2.10 Curvilinear Integrals 941 3.2.10.1 Curvilinear Integral of a scalar field 941 3.2.10.2 Curvilinear Integral of a vector field 942 3.2.11 Integrals involving parametric equations 945 3.3 Differential Equations 948 3.3.1 First order Differential Equations 949 3.3.2 Linear Differential Equations 949 3.3.3 Resolution Methods of Differential Equations 951 3.3.3. 1 Method of characteristic polynomial 952 xiii 3.3.3. 1.1 Resolution of the H.E. of the first order L.D.E. with constant coefficients 952 3. 3. 3. 1.2 Resolution of the H.E. of the first order L.D.E. with non-constant coefficients .... 953 3. 3. 3. 1.3 Resolution of the H.E. of the second order L.D.E. with constant coefficients 954 3. 3. 3. 2 Integrating Factor Method (Euler’s Method) .... 958 3. 3. 3. 3 Method of separation of variables 961 3. 3. 3.4 Method of constant variation 962 3.3.4 Classification of partial differential equations 965 3.4 Systems of Differential Equations 969 3.5 Regular Methods of Perturbations 974 3.5.1 Perturbation theory for algebraic equations 974 3.5.2 Perturbation theory of differential equations 976 4 Sequences and Series 980 4.1 Sequences 980 4.1.1 Arithmetic Sequences 981 4.1.2 Harmonic Sequences 984 4.1.3 Geometric Sequences 985 4.1.4 Cauchy Sequence 986 4.1.5 Fibonacci Sequence 990 4.1.6 Logic Sequences/Psychologist Sequences 991 4.2 Series 992 4.2.1 Gauss Series 993 4.2. 1.1 Bernoulli’s Numbers and Polynomials 996 4.2.2 Arithmetic Series 1001 4.2.3 Geometric Series 1002 4.2.3. 1 Zeta function and Euler’s identity 1003 4.2.4 Telescoping Series 1007 4.2.5 Grandi’s Series 1008 4.2.6 Taylor and Maclaurin Series 1011 4.2.6. 1 Usual Maclaurin developments 1016 4. 2. 6. 2 Taylor series of bivariate functions (multivariate Taylor series) 1024 4. 2. 6. 3 Quadratic Form 1025 4. 2. 6.4 Lagrange Remainder 1028 4. 2. 6. 5 Taylor Series with Integral Remainder 1030 4.2.7 Fourier Series (trigonometric series) 1032 4.2.7. 1 Power of a signal 1051 4. 2. 7. 2 Fourier Transform 1053 4.2.8 Bessel Series 1062 4.2.8. 1 Zero order Bessel’s Functions 1062 4. 2. 8. 2 n order Bessel’s Functions 1062 4. 2. 8. 3 Bessel’s Differential Equations of order n 1068 4.3 Convergence Criteria 1069 4.3.1 Integral Test 1069 4.3.2 D’Alembert Rule 1070 4.3.3 Alternating Series Test 1073 xiv 4.3.4 Fixed Point Theorem 1073 4.4 Generating Functions (transformation of a sequence into a series) . . . 1075 4.4.1 Ordinary Generating Functions (transformation of a sequence into a series) 1075 4. 4. 1.1 Composition of Generating functions 1078 4.4.2 Multivariate Generating Functions 1080 4.4.3 Functional Generating Functions 1080 5 Vector Calculus 1083 5.1 Concept of Arrow 1085 5.2 Set of Vectors 1086 5.2.1 Pseudo- Vectors 1088 5.2.2 Multiplication by a scalar 1090 5.2.2. 1 Rule of three 1090 5.3 Vector Spaces 1092 5.3.1 Linear Combinations 1093 5.3.2 Sub- vector spaces 1093 5.3.3 Generating families 1094 5.3.4 Linear Dependance or Independance 1094 5.3.5 Base of a vectorial space 1095 5.3.6 Direction Angles 1097 5.3.7 Dimensions of a vector space 1098 5.3.8 Extension of a free family 1098 5.3.9 Rank of a finite family 1099 5.3.10 Direct Sums 1100 5.3.11 Affine spaces 1101 5.4 Euclidean Vectore Spaces 1104 5.4.1 Scalar Product (Dot Product) 1106 5. 4. 1.1 Cauchy-Schwarz inequality 1111 5. 4. 1.2 Triangular Inequalities 1112 5. 4. 1.3 General Scalar/Dot Product 1113 5.4.2 Cross Product 1114 5.4.3 Mixed Product (triple product) 1121 5.5 Vectorial Functional Space 1122 5.6 Hermitian Vector Space 1125 5.6.1 Hermitian Inner Product 1126 5.6.2 Types of Vectors Spaces 1127 5.7 System of Coordinates 1128 5.7.1 Cartesian (rectangular) Coordinate System 1128 5.7.2 Spherical Coordinate System 1130 5.7.3 Cylindrical Coordinate System 1135 5.7.4 Polar Coordinate System 1137 5.8 Differential Operators 1139 5.8.1 Gradients of Scalar Field 1 141 5.8.2 Gradients of Vector Field 1149 5.8.3 Divergences of a Vector Field 1150 5.8.4 Rotationals of a Vector Field (Curl) 1158 5.8.5 Faplacians of Scalar Field (Faplace Operator) 1168 5.8.6 Faplacians of a Vector Field 1172 xv 5.8.7 Remarkable Identities 1179 5.8.8 Summary 1182 6 Linear Algebra 1186 6.0.1 Linear Systems 1190 6.1 Linear Transformations 1194 6.2 Matrices 1196 6.2. 1 Rank of a matrix 1197 6.2.2 Matrix Algebra 1202 6.2.3 Type of Matrices 1205 6.2.4 Determinant 1213 6.2.4. 1 Derivative of a Determinant 1225 6. 2.4. 2 Determinant Cofactor and Matrix Inverse 1226 6.3 Change of Basis (frames) 1228 6.4 Eigenvalues and Eigenvectors 1231 6.4.1 Rotation Matrices and Eigenvalues 1234 6.5 Spectral Theorem 1237 6.6 Singular Value Decomposition (SVD) 1242 6.6.1 Singular Vectors 1243 7 Tensor Calculus 1251 7.1 Tensor 1252 7.2 Indicial Notation 1254 7.2.1 Summation on multiple index 1255 7.2.2 Kronecker Symbol 1255 7.2.3 Antisymmetric Symbol (Levi-Civita symbol) 1256 7.3 Metric and Signature 1262 7.4 Gram’s Determinant 1264 7.5 Contravariant and Covariant Components 1270 7.6 Operation in Basis 1272 7.6.1 Gram-Schmidt Orthogonalization Method 1272 7.6.2 Change of Basis 1273 7.6.3 Reciprocal Basis (Dual Basis) 1274 7.7 Euclidean Tensors (cartesian tensor) 1279 7.7.1 Fundamental Tensor 1279 7.7.2 Tensor product (dyadic) of two vectors and matrices 1280 7.7.3 Tensor Spaces 1283 7.7.4 Linear combination of tensors 1287 7.7.5 Contraction of indices 1287 7.7.5. 1 Raising and lowering indices 1288 7.8 Special Tensors 1292 7.8.1 Symmetric Tensor 1292 7.8.2 Antisymmetric Tensor 1295 7.8.3 Fundamental Tensor 1298 7.9 Curvilinear Coordinates 1299 8 Spinor Calculus 1332 8.1 Unit Spinor 1333 8.2 Geometric Properties 1339 8.2.1 Plane Symmetries 1339 8.2.2 Rotations 1342 xvi 8.2.3 Properties of Pauli Matrices 1347 6 Analysis 1354 1 Functional Analysis 1355 1.1 Representations 1356 1.1.1 Tabular Representation 1356 1.1.2 Graphic al Representation 1357 1.1. 2.1 2D representations 1358 1.1. 2. 2 3D representations 1362 1.1. 2. 3 2D Vector representations 1368 1 . 1 .2.4 Properties of visual representations 1371 1.1.3 Analytical Representation 1376 1.2 Functions 1379 1.2.1 Limits and Continuity of Functions 1391 1.2. 1.1 Limit laws 1398 1.2.2 Asymptotes 1399 1.3 Logarithms 1405 1.4 Transformations 1413 1.4.1 Fourier Transform 1413 1.4.2 Laplace Transform 1413 1.4.3 Hilbert Transform 1414 1.5 Functional dot product (inner product) 1414 2 Complex Analysis 1419 2.1 Linear Applications 1419 2.2 Holomorphic Functions 1429 2.2.1 Orthogonality of real and imaginary iso-curves 1435 2.3 Complex Logarithm 1437 2.4 Complex Integral Calculus 1440 2.4.1 Convergence of a complex series 1447 2.5 Path Decomposition 1456 2.5.1 Inverse Path 1457 2.6 Laurent Series 1459 2.7 Singularities 1467 2.8 Residue Theorem 1470 2.8.1 Pole at infinity 1474 3 Topology 1476 3.1 General Topology 1477 3.1.1 Topological Spaces 1478 3.2 Metric Space and Distance 1479 3.2.1 Equivalent Distances 1484 3.2.2 Lipschitz Functions 1484 3.2.3 Continuity and Uniform Continuity 1487 3.3 Opened and Closed Set 1489 3.3.1 Balls 1490 3.3.2 Partititions 1494 3.3.3 Formal Ball 1496 3.3.4 Diameter 1497 3.4 Varieties 1499 xvii 3.4.1 Surfaces Homeomorphism 1500 3.4.2 Differential Varieties 1504 4 Measure Theory 1506 4.1 Measurable Spaces 1506 4.1.1 Monotone Classes 1514 7 Geometry 1517 1 Trigonometry 1519 1.1 Radian 1519 1.2 Circle Trigonometry 1521 1.2.1 Remarkable trigonometric triangle identities 1530 1.2. 1.1 Laws of Cosines 1534 1.2. 1.2 Laws of Sines 1536 1.3 Hyperbolic Trigonometry 1537 1.3.1 Remarkable hyperbolic identities 1544 1.4 Spherical Trigonometry 1546 1.5 Solide Angle 1551 2 Euclidean Geometry 1554 2.1 Objects of Euclidean Geometry 1554 2.1.1 Dimensions 1556 2.2 Euclid Constructions 1564 2.2.1 Segments and Lines 1565 2. 2. 1.1 Quantities of the same type 1566 2.3 Plane Geometry 1571 2.3.1 Displacements and Turnarounds 1572 2.3.2 Plane angles 1573 2.3.2. 1 Angle Measurements 1580 2. 3. 2. 2 Units of Angle Measurements 1583 2. 3. 2. 3 Bisector 1586 2.3.3 Triangles 1588 2.3.3. 1 Equal Triangles (congruent triangles) 1589 2. 3. 3. 2 Isosceles Triangles 1592 2. 3. 3. 3 Equilateral Triangles 1596 2. 3. 3.4 Right Triangle 1596 2. 3. 3. 5 Right Isosceles Triangle 1597 2. 3. 3. 6 Inequalities in the triangles 1599 2. 3. 3.7 Triangles remarkable interior lines 1601 2. 3. 3. 8 Pythagorean theorem 1602 2. 3. 3. 9 Thales’ Theorem (intercept theorem) 1603 2.3.4 Parallelism 1608 2.3.5 Circle 1610 2.3.5. 1 Circumscribed circle theorem 1613 2. 3. 5. 2 Inscribed circle theorem 1614 2. 3. 5. 3 Thales’ theorem of the circle 1616 2. 3. 5.4 Central angle theorem 1617 2.4 Hilbert’s Axioms 1621 2.4.1 Incidence Axioms (axioms of association) 1622 2.4.2 Order Axioms 1622 xviii 2.4.3 Congruence Axioms 1623 2.4.4 Continuity Axioms 1623 2.4.5 Parellels Axioms 1624 2.5 B ary center (centroid) 1625 2.6 Geometric Transformations 1630 2.6.1 Translation 1632 2.6.2 Homothety (scaling) 1633 2.6.3 Shear (skew) transformation 1635 2.6.4 Rotation 1637 2.6.4. 1 Gimbal lock 1642 2. 6.4. 2 Euler angles 1646 2.6.5 Reflection 1648 3 Non-Euclidean Geometry 1651 3.1 Axioms of non-euclidean geometry 1653 3.2 Geodesic and Metric Equation 1654 3.3 Riemann Spaces 1658 4 Projective Geometry 1664 4.1 Conical Perspective (Central Perspective) 1665 4.1.1 Images of Points 1668 4.1.2 Images of Straight Lines 1684 4.2 Affine projections 1691 4.2.1 Isometric perspective 1693 4.2.2 Oblique perspective 1699 4.2.3 Orthogonal projection 1700 4.2.4 Spherical projection 1705 4.2.4. 1 Stereographic projection 1706 4. 2.4. 2 Cylindrical projection 1713 4. 2.4. 3 Mercator projection 1714 4.2.5 Other perspectives 1717 4.3 Homogeneous Coordinates (projection coordinates) 1720 4.3.1 V 2 Projective Space 1721 4.3.2 V 3 Projective Space 1723 5 Analytical Geometry 1726 5.1 Conics 1726 5.1.1 Algebraic approach 1727 5.1.2 Geometric Approach 1736 5.1.3 Dudelin Theorem (Dudelin Spheres) 1749 5.1.4 Classification of conical by the determinant 1750 5.2 Parametrizations 1755 5.2.1 Equation of the Plane 1755 5.2.2 Equation of the Straight line 1759 5.2.2. 1 Distance from a line to a point 1761 5. 2. 2. 2 Line defined by the intersection of planes 1762 5. 2. 2. 3 Parametric equation of a line in M 3 1762 5.2.3 Equation of a Square 1764 5.2.4 Equation of a Cycloid 1765 5.2.5 Equation of an Epicycloid 1766 5.2.6 Equation of an Hypocycloid 1768 xrx 5.2.7 Surface of revolution 1770 5.2.7. 1 Cone 1770 5. 2.7.2 Sphere 1772 5.2.73 Ellipsoid (spheroid, geoid) 1774 5. 2.7.4 Cylinder 1777 5. 2.7.5 Paraboloid 1778 5. 2. 7. 6 Hyperboloid 1780 5. 2.7.7 Torus 1785 6 Differential Geometry 1787 6.1 Parametric Curves 1787 6.2 Isolines 1793 6.3 Frenet Frame 1799 6.4 Surface Patchs 1810 6.4. 1 Metric of a Surface Patch 1811 6. 4. 1.1 Regularity of a Surface 1813 7 Geometric Shapes 1817 7.1 Known Surfaces (Areas) 1818 7.1.1 Polygons 1818 7.1.2 Rectangle 1823 7.1.3 Square 1824 7.1.4 Unspecified Triangle 1826 7.1.5 Isosceles Triangle 1830 7.1.6 Equilateral Triangle 1832 7.1.7 Right Triangle 1834 7.1.8 Trapezoid 1835 7.1.9 Parallelogram 1837 7.1.10 Hexagon 1839 7.1.11 Rhombus 1844 7.1.12 Circle 1845 7.1.13 Ellipse 1847 7.2 Known Volumes 1850 7.2.1 Polyhedron 1851 7. 2. 1.1 Parallelepiped 1851 7. 2. 1.1.1 Moment of Inertia of a rectangular plate . . .1853 7. 2. 1.1. 2 Moment of Inertia of a triangular plate . . .1855 7.2. 1.2 Pyramid 1857 7.2. 1 .2.1 Moment of Inertia of a regular square pyramid 858 7. 2. 1.3 Right Prism 1859 7. 2. 1.4 Regular Polyhedron 1860 7. 2. 1.5 Regular Tetrahedron 1865 7. 2. 1.6 Regular hexahedron (cube) 1868 1 . 2 . 1.1 Regular octahedron 1868 7. 2. 1.8 Regular Icosahedron 1872 7. 2. 1.9 Regular Dodecahedron 1876 7.2.2 Solids of Revolution 1881 7.2.2. 1 Cylinder 1883 7. 2.2.2 Cone 1887 7.2.23 Sphere 1889 xx 1.2.2 A Torus 1893 1. 2.2.5 Ellipsoid (spheroid, geoid) 1897 1. 2.2.6 Paraboloide 1902 1 .2.2.1 Wine Barrel with Circular Section 1904 8 Graph Theory 1908 8.1 Type of Graphs and Structures 1909 8.2 Graph Adjacency matrix 1931 8.3 Categories 1936 9 Knot Theory 1940 9.1 Braids Representation 1940 9.1.1 Braids Group 1942 9.2 Knot Representation 1945 9.2.1 Knots Group 1947 9.3 Tait’s Knot 1950 9.4 Mathematical Formalisation 1954 9.4.1 Planar Representation 1962 8 Mechanics 1964 1 Principia 1966 1.1 System of Units 1969 1.1.1 Dimensional Analysis 1976 1.1. 1.1 Time 1978 1.1. 1.2 Length 1979 1.1. 1.3 Mass 1981 1.1. 1.4 Energy 1984 1.1. 1.5 Electric Charge 1986 1.1.2 Scientific Notation and Metric Prefixes 1989 1.1.3 Scales of Measurements 1992 1.2 Distributions 2004 1.3 Constants 2005 1.3.1 Mathematical Constants 2005 1.3.2 Universal Constants (fundamental constants) 2006 1.3.3 Astronomical/Astrophysical parameters and constants . . . .2008 1.3.4 Chemical parameters 2008 1.3.5 Material parameters 2009 1.3.6 Planck’s constants 2010 1.4 Principles of Physics 2014 1.4.1 Principle of Causality 2014 1 .4.2 Principle of Conservation of Energy 2015 1.4.3 Principle of Least Action 2017 1.4.4 Noether’s Principle (Noether’s theorem) 2017 1.4.4. 1 Invariance by translation in space 2020 1.4.4. 2 Invariance by rotation in space 2021 1.4.4. 3 Invariance by translation in space 2022 1.4.4. 4 Noether’s theorem 2023 1.4.5 Curie’s Principle 2027 1 .5 Point Spaces 2028 2 Analytical/Lagrangian Mechanics 2035 xxi 2.1 Lagrangian formalism 2037 2.1.1 Generalized coordinates and frames 2038 2.1.2 Variational Principle 2042 2.1.3 Euler- Lagrange Equation 2043 2. 1.3.1 Beltrami Identity 2052 2. 1 .3.2 Theorem of Variational Calculus 2053 2.2 Canonical Formalism 2055 2.2.1 Legendre Transform 2055 2.2.2 Hamiltonian 2056 2.2.3 Poisson bracket 2063 2.2.4 Canonical transformations 2067 Classical Mechanics 2068 3.1 Newton’s Laws 2072 3.1.1 Newton First Law (Inertia Law) 2072 3.1.2 Newton Second Law (Fundamental Principle of Dynamics) . 2074 3.1.3 Newton Third Law (Law of Action and Reaction) 2077 3.2 Center of Mass and Reduced Weight 2078 3.2.1 Center of Mass Theorem 2081 3.2.2 Guldin’s Theorem 2086 3.3 Kinematics of Rectilinear Motion 2089 3.3.1 Position 2090 3.3.2 Velocity 2090 3.3.3 Acceleration 2092 3.3.3. 1 Osculator Plane 2095 3.3.4 Galilean Relativity Principle 2097 3.4 Angular Momentum 2101 3.4.1 Moments 2106 3.4.2 Static Forces 2111 3.5 Ballistics 2115 3.6 Kinematics of Circular Motion 2120 3.7 Energy, Work and Power 2128 3.7.1 Conservative vector field 2129 3.7.2 Kinetic Energy 2132 3.7.2. 1 Moment of inertia 2133 3 . 1.22 Gyroscope 2150 3.7. 2. 2.1 Classical Approach with precession only . .2152 3 . 1 . 22.2 Lagrangian Approach with precession and nutation 2157 3 . 12.3 Konig’s kinetic and angular momentum theorems . .2169 3. 7. 2. 3.1 First Konig’s Theorem (Konig’s angular mo- mentum theorem) 2169 3 . 12.32 Second Konig’s Theorem (Konig’s kinetic energy theorem) 2171 3.7.3 Gravitational Potential Energy 2172 3.7.3. 1 Gravitational Potential Energy of a Material Sphere 2176 3.7.4 Conservation of Total Mechanical Energy 2178 3.7.4. 1 Generalized Newton Law 2180 3.7.5 Conservation of Linear Momentum 2184 xxii 3.7.5. 1 Elastic Collision in 1-dimensions 2185 3. 7. 5. 2 Elastic Collision in 2-dimensions 2187 3. 7. 5. 3 Inelastic Collision in 2-dimensions 2189 3.7.6 Power 2190 3.7.6. 1 Power of a turning machine 2191 3. 7. 6. 1.1 Poweryield 2192 3.8 Relative Movements and Inertial Forces 2193 3.8.1 Coriolis force and deflection magnitude 2201 3.9 Oscillating Movements 2205 3.9.1 Newton’s cradle 2205 3.9.2 Simple Pendulum 2207 3.9.3 Physical Pendulum 2214 3.9.4 Elastic Pendulum (spring pendulum) 2216 3.9.4. 1 One degree of freedom elastic pendulum with/with- out friction 2217 3. 9.4. 2 Two degrees of freedom elastic pendulum without friction 2221 3.9.5 Conical Pendulum 2222 3.9.6 Torsion Pendulum 2224 3.9.7 Foucault’s Pendulum 2225 3.9.8 Huygens’ Pendulum (and brachistochrone curve) 2230 3.9.9 Double Pendulum 2237 3.9.10 Inverted Pendulum 2242 3.10 Tribology 2246 3.10.1 Exponential Friction 2253 3.10.2 Horizontal Viscous Friction 2255 3.10.3 Vertical Viscous Friction 2257 3.10.4 Stokes’ Vertical Viscous Friction 2258 3.10.5 Stokes’ Horizontal Viscous Friction 2261 3.10.6 Friction’s Heat Factor 2264 4 Wave Mechanics 2265 4.0. 1 Wave Function 2265 4.0. 2 Wave Equation 2266 4.1 Type of Waves 2268 4.1.1 Periodic Waves 2268 4.1.2 Harmonic Waves 2269 4. 1.2.1 Phase velocity and Group velocity 2270 4.1.3 Stationnary Waves 2272 4.1.4 Vibration Modes in a Stretch String 2275 4. 1.4.1 Dirichlet Conditions 2277 4. 1.4. 2 Neumann Conditions 2282 4.2 Non-relativistic Fagrangian of a String 2284 4.3 Vibrational modes of a circular membrane 2288 5 Statistical Mechanics 2298 5.1 Statistical Information Theory 2298 5.2 Boltzmann Faw 2305 5.3 Statistical Physics Distributions 2310 5.3.1 Maxwell Distribution (velocity distribution) 2310 xxiii 5.3.2 Maxwell-Boltzmann Distribution 2316 5.3.2. 1 Boltzmann Distribution 2322 5. 3. 2. 2 Fermi-Dirac Distribution 2324 5. 3. 2. 3 Bose-Einstein Distribution 2328 5.4 Brownian Motion 2333 6 Thermodynamics 2343 6.1 Thermodynamic Variables 2344 6.2 Thermodynamics Systems 2347 6.3 Thermodynamic Transformations 2348 6.4 State Variables 2350 6.4.1 Phases 2352 6.5 Equation of State 2354 6.5.1 Ideal Gaz Law 2354 6.5.2 State equation of a Liquid 2355 6.5.3 State equation of Solids 2357 6.6 Laws of Thermodynamics 2360 6.7 Calorific Capacities (heat capacity) 2362 6.8 Internal Energy 2372 6.8.1 Work (energy) of Mechanical Forces 2376 6.8.2 Enthalpy 2378 6.8.3 Laplace’s Law 2381 6.8.4 Saint- Venant Thermodynamic Equation 2385 6.8.5 Thermoelastics coefficients 2387 6.9 Heat 2390 6.9.1 Entropy 2392 6. 9. 1.1 Heat Flow 2396 6.9.2 Carnot Cycle 2398 6.10 Maxwell relations 2402 6.11 Continuity Equation 2406 6.11.1 Heat Equation 2412 6.11.1.1 Fick’s laws of diffusion 2423 6.12 Thermal radiation 2426 6.12.1 Black Body radiation 2426 6.12.1.1 S tefan-B oltzmann law 2428 6.12.1.2 Planck’s law 2437 7 Continuum Mechanics 2450 7.1 Rigid Bodies 2451 7.1.1 Pressures 2451 7.1.2 Elasticity of Solids 2452 7. 1.2.1 Hooke’s law 2457 7. 1.2. 2 Shear Modulus 2465 7. 1.2.3 Compressibility Modulus (bulk modulus) 2473 7. 1.2.4 Flexural Modulus (bending modulus) 2475 7. 1.2.5 Tranverse Wave in Solids 2478 7.2 Liquids 2482 7.2.1 Pascal’s Fluid Theorem 2484 7.2.2 Viscosity 2486 7.2.2. 1 Poiseuille’s Law 2489 xxrv 7.2.3 Bernouilli’s Theorem 2491 7.2.3. 1 Torricelli’s law 2498 1 . 23.2 Communicating vessels 2502 1.233 Venturi effect 2504 7. 2.3.4 Pitot Tube 2507 1 . 23.5 Pressure drop (pressure loss) 2509 7.2.4 Navier-Stokes Equations 2511 7.2.4. 1 Incompressible flow 2528 7. 2.4. 2 Compressible flow 2536 7. 2.4.3 Static flow 2536 7. 2.4. 4 Reynolds number 2537 7. 2.4. 5 Boussinesq approximation (buoyancy) 2540 7. 2.4. 6 Stokes’ law 2542 7.2.5 Hydrostatic Pressure 2548 7.2.6 Archimedes’ principle 2550 7.2.7 Speed of sound in a liquid 2552 7.3 Gas 2553 7.3.1 Types of Gas 2553 7. 3. 1.1 Perfect Gas 2553 7.3. 1.2 Real Gas 2557 7.3.2 Virial Theorem 2557 7.3.3 Kinetic pressures (kinetic theory of gases) 2570 7.3.4 Kinetic Temperature 2572 7.3.5 Amagat and Dalton’s law 2573 7.3.6 Mean free path (in kinetic theory) 2575 7.4 Plasmas 2578 7.4.1 Plasma Frequency 2580 9 Electromagnetism 2584 1 Electrostatics 2586 1.1 Electric Force 2587 1.2 Electric Potential 2591 1.2.1 Path Independance 2594 1 .3 Equipotential and Field lines 2595 1.3.1 Infinite straight wire 2597 1.3.2 Electric Rigid Dipole 2599 1.4 Electric Field Flow 2610 1.4.1 Capacitor 2611 1.4. 1.1 Dielectric strength 2615 1.5 Electrostatic potential energy 2616 2 Magnetostatics 2618 2.1 Ampere’s theorem 2621 2.1.1 Infinitely long solenoid 2623 2.1.2 Toroidal coils 2624 2.1.3 Electromagnet 2626 2. 1 .3. 1 Strength of a magnet or electromagnet 2627 2.2 Maxwell-Ampere Relation 2629 2.3 Biot-Savart law 2630 xxv 2.3.1 Magnetic field for a current loop 2631 2.3.2 Magnetic field for an infinite wire 2633 2.4 Magnetic dipole 2635 2.5 Lorentz law (Lorentz force) 2643 2.5.0. 1 Magnetic Vector Potential 2647 2. 5.0. 2 Work of Magnetic Field 2648 2.5.1 Classical Hall effect 2651 2.5.2 Larmor radius 2654 2.5.3 Energy of a magnetic dipole 2660 2.6 Langevin treatment of Diamagnetism and Paramagnetism 2663 2.6.1 Langevin model of diamagnetism 2663 2.6.2 Langevin model of paramagnetism 2666 3 Electrodynamics 267 1 3.1 Maxwell Equations 2672 3.1.1 First Maxwell Equation (constant electric flow) 2672 3.1.2 Second Maxwell Equation (non-existence of magnetic monopomple) 2675 3.1.3 Third Maxwell Equation 2676 3. 1.3.1 Betatron 2679 3.1.4 Fourth Maxwell Equation 2683 3.1.5 Magnetic Monopoles 2686 3.2 Charge conservation equation 2688 3.3 Gauge Theory 2689 3.3.1 Electromagnetic field tensor 2696 3.4 Electromagnetic wave equation 2712 3.4.1 Helmholtz equation 2717 3.4.2 Energy flow transportation (Poynting vector) 2719 3.4.3 Emissions 2722 3.5 Synchrotron radiation (bremsstrahlung) 2725 3.5.1 Lienard-Wiechert potentials 2730 3.5.2 Retarded Electric and Magnetic fields 2737 4 Electrokinetics 2756 4.1 Kirchoff’s laws 2757 4.1.1 Mesh law (Kirchhoff’s Loop Law) 2758 4.1.2 Nodes law (Kirchhoff’s Point Law) 2758 4.2 Drude model 2759 4.3 Ohm’s law 2765 4.3.1 Equivalent Resistance 2768 4.3.2 Equivalent Capacities 2769 4.4 Electromotive Force 2771 4.4.1 Faraday’s law of indicution 2775 5 Optics (ray optics) 2777 5.1 Sources and Shadows 2777 5.2 Colors 2781 5.3 Radiometry /Photometry 2788 5.3.1 Energy flow 2788 5. 3. 1.1 Beer-Lambert law 2788 5.3.2 Light Intensity (Radiant Intensity) 2790 xxvr 5.3.3 Energy Emittance (Radiant Emittance) 2791 5.3.4 Radiance and Luminance 2792 5.3.4. 1 Lambert’s Law 2794 5.3.5 Kirchhoff’s law of Radiation 2795 5.3.6 Spectral Decomposition 2796 5.4 Law of Refraction 2797 5.4.1 Refractive index 2801 5.4.2 Snell’s law 2804 5.4.3 Cherenkov radiation 2810 5.5 Descartes’ Lormulas 2812 5.5.1 Stigmatism 2816 5.5.2 Lenses 2821 5.5.2. 1 Optical Magnification 2833 5. 5. 2. 2 Human eyes 2841 5.5.3 Triangular Prism 2841 5.5.4 Pentaprism 2846 5.6 Rainbow 2850 6 Wave Optics 2857 6.1 Huygens’ principle 2857 6.2 Fraunhofer Diffraction 2861 6.2. 1 Case of a rectangular aperture 2861 6. 2. 1.1 Optical resolution 2868 6.2.2 Case of a network of rectangular apertures 2870 6.2.3 Young’s interference experiment 2876 6.3 Light polarization 2883 6.3.1 Linear polarization 2888 6.3.2 Elliptical polarization 2889 6.3.3 Circular polarization 2890 6.3.4 N atural polarization 2891 6.3.5 Malus’ law 2894 6.4 Coherence and interference 2895 6.5 LASER 2901 10 Atomistic 2906 1 Corpuscular Quantum Physics 2908 1.1 Dalton’s model 2909 1.2 Thomson’s model 2910 1.3 Rhuterfords’s model 2911 1.4 Bohr’s Model 2913 1.4.1 Bohr’s Postulates 2913 1.4.2 Quantification 2914 1.4.3 Hydrogen Type Atoms Model without dragging 2916 1.4.4 Hydrogen Type Atoms Model with dragging 2921 1.4.5 Neutron Assumption 2924 1.5 Wilson and Sommerfeld’s Model 2926 1.6 Relativistic Sommerfeld Model 2930 1.6.1 Magnetic dipole moment 2945 1.6.2 Spin 2948 xxvii 1.6.3 Pauli exclusion principle 2950 1 .7 Electron configuration (atomic orbital) 295 1 2 Wave Quantum Physics 2960 2.1 Postulates 2961 2.1.1 1st Postulate: Quantum State 2962 2.1.2 2nd postulate: Time evolution of a quantum state 2964 2.1.3 3rd postulate: Observables and operators 2965 2.1.4 4th postulate: Measure of a property) 2967 2.1.5 5th postulate: Average of a property) 2969 2.2 Classical principles of uncertainty 2970 2.2.1 First classical uncertainty relation 2971 2.2.2 Second classical uncertainty relation 2972 2.2.3 Third classical uncertainty relation 2973 2.3 Quantum algebra 2976 2.3.1 Linear functional operators 2976 2. 3. 1.1 Hermitian and Self-adjoint operators 2980 2. 3. 1.2 Commutators and Anticommutators 2983 2. 3. 1.3 Representatives 2987 2. 3. 1.4 Eigenvalues and Eigenfunctions 2989 2. 3. 1.4.1 Orthogonality of eigenfunctions 2990 2. 3. 1.5 Dirac formalism 2993 2. 3. 1.5.1 Kets and Bras 2993 2.4 Schrodinger Model 2996 2.4.1 de Broglie associated wave 2996 2.4.2 Classical Schrodinger Wave Equation 2998 2.4.2. 1 Schrodinger Hamiltonian 3000 2. 4. 2. 2 De Broglie normalization condition 3004 2. 4. 2. 3 Bound and unbound states 3006 2.4.3 Classical Shrodinger equation of evolution 3008 2.4.3. 1 Operator of evolution 3008 3 Relativistic Quantum Physics 3066 3.1 Relativistic Schrodinger evolution equation 3067 3.1.1 Antimatter 3068 3.2 Generalized Klein-Gordon Equation 3071 3.3 Classical free Dirac equation 3077 3.4 Linearized Dirac Equation 3090 3.5 Generalized Dirac Equation 3102 3.6 Pauli Equation 3103 4 Nuclear Physics 3112 4. 1 Nuclear Weapon 3112 4.2 Radioactivity 3115 4.2.1 Disintegration 3118 4.2. 1.1 Half-life isotope 3119 4.2.2 Activity 3120 4.2.2. 1 Carbon- 14 dating (radiocarbon dating) 3122 4. 2. 2. 2 Radioactive Decay chain 3126 4.2.3 Two level radioactive cascade 3127 4.2.3. 1 Secular equilibrium 3129 xxviii 4. 2. 3. 2 Transient equilibrium 3130 4. 2. 3. 3 Nonequilibrium 3130 4.2.4 Radioactive phenomena 3131 4.2.4. 1 Nuclear Fusion (1) 3136 4. 2.4. 2 Nuclear Fission (2) 3137 4. 2.4. 3 Alpha Disintegration (3) 3139 4. 2.4. 4 Beta- Disintegration (4) 3148 4. 2.4. 5 Beta-r Disintegration (5) 3149 4. 2.4. 6 Electronic capture (6) 3150 4. 2.4. 7 Gamma emission (7) 3151 4. 2.4. 8 Internal conversion (8) 3152 4.3 Radiation protection 3154 5 Quantum Field Theory 3176 5 . 1 Yukawa potential 3179 5.1.1 Mass fields 3181 5.1.2 Non-mass fields 3182 5.2 Euler-Lagrange equation for Fields 3184 5.3 Gauge Theories 3195 5.3.1 Global Gauge invariance 3196 5.3.2 Local Gauge invariance 3198 6 Elementary Particle Physics 3203 6.1 Coupling Constants 3205 6.2 Spin magnetic resonance 3210 11 Cosmology 3216 1 Astronomy (Celestial Mechanics) 3218 1.1 Drake Equation 3218 1.2 Kepler’s Laws 3219 1.2.1 First Kepler’s Law (conicity law) 3219 1.2.2 Second Kepler’s Law (area law) 3220 1.2.2. 1 Time of flight 3222 1.2.3 Third Kepler’s Law (periods’ law) 3226 1.3 Newton Gravitational Law 3229 1 .3. 1 Gaussian Formulation of Newtonian Gravity 3236 1.3.2 Shell Theorem 3238 1.3.3 Orbital speed 3240 1.3.4 Asteroids/Meteors impact velocity 3241 1.3.5 Spherisation of Celestial Bodies 3241 1.3.5. 1 Flattening of Celestial Bodies (rotational flattening) 3243 1.3.6 Stability of Atmospheres 3245 1.4 Roche’s Limit 3246 1.5 Keplerian Orbitals 3248 1 .5. 1 First Binet Formula 3248 1.5.2 Second Binet Formula 3253 1.5.3 Keplerian orbital period 3257 1 .5.4 Classical deflection of light 3258 1 .5.5 Classical precession of perihelia 3260 1.6 Duration of the diurnal arc 3270 xxix 1.6.1 Trigonometric parallax 3277 1.7 Planets’ Motion 3279 1.7.1 Synodic and Sidereal period 3279 1.7.2 Planet’s apparent retrograde motion 3282 1.8 Lagrange Points 3288 1.8.1 Equilibrium points of the first type 3297 1 . 8 . 1 . 1 LI Lagrange point 3297 1.8. 1.2 L2 Lagrange point 3301 1.8. 1.3 L3 Lagrange point 3304 1.8.2 Equilibrium points of the second type 3306 1.8.2. 1 L4, L5 Lagrange points 3306 1.9 Relativistic Doppler-Lizeau Effect 3312 1.9.1 Apparent speed 3316 2 Astrophysics 3320 2.1 Stars 3320 2.1.1 Stellar Physics 3327 2. 1 . 1 . 1 Collapse of an Interstellar Cloud 3327 2. 1.1. 1.1 Limit Mass Cloud for Ionization (rogue planets) 3329 2. 1 . 1 . 1 .2 Limit Mass Cloud for Lusion (black dwarf) . 3330 2. 1 . 1 .2 Nuclear Duration Life 3332 2. 1.1.3 Internal Temperature 3334 2. 1.1. 4 External temperature 3336 2.1. 1.5 Equation of Hydrostatic Equilibrium 3338 2.1. 1.6 Brightness 3341 2. 1 . 1 .7 Shining (apparent brightness) 3341 2. 1 . 1 .8 Apparent magnitude 3343 2. 1.1. 9 Absolute magnitude 3345 2.1.2 Pulsative Variable Stars 3348 2.1.3 Neutron Stars (magnetars) 3351 2 . 1 . 3 . 1 Chandrasekhar limit 3352 2. 1.3.2 Neutron star magnetic field 3353 2.2 Galaxies 3354 2.2.1 Radial Speed Anamoly 3355 3 Special Relativity 3358 3.1 Assumptions and Principles 3359 3.1.1 Postulate of Invariance 3359 3.1.2 Cosmological Principle 3359 3.1.3 Special Relativity Principle 3360 3.2 Lorentz Transformations/Boost 3362 3.2.1 Displacement four- vector 3366 3.2. 1.1 Wave Equation Invariance 3369 3. 2. 1.2 Hypergeometric interpretation 3370 3.2.2 Velocity four- vector 3372 3.2.3 Current four-vector 3376 3.2.4 Acceleration four- vector 3376 3.2.5 Relativistic sum of velocities 3381 3.2.6 Relativistic lengths variation (length contraction) 3382 xxx 3.2.7 Relativistic time variation (time dilatation) 3384 3.2.7. 1 Hafele-Keating experiment 3386 3. 2.7. 2 Twins paradox 3390 3.2.8 Apparent relativistic mass 3392 3.2.8. 1 Mass-energy equivalence 3397 3. 2. 8. 2 Relativistic Lagrangian 3398 3. 2. 8. 3 Relativistic (linear) momentum 3400 3. 2. 8. 3.1 Einstein relation 3404 3.2. 8.3.2 Time of flight 3405 3. 2. 8. 3. 3 Relativistic force 3407 3.2.9 Relativistic electrodynamics 3410 3.2.9. 1 Tensor field transformation 3418 3.3 Minkowski space-time 3419 3.3.1 Four-vectors 3421 3.3.2 Universe light cone 3423 4 General Relativity 3430 4.1 Assumptions and Principles 3430 4.1.1 Equivalence Postulates 3430 4.1.2 Mach Principle 3434 4.2 Metrics 3435 4.2.1 Schild Criteria (Einstein red-shift effect Newtonian approach) 3440 4.3 Equations of movement 3443 4.3.1 Geodesic equations 3452 4.3.2 Newtonian Limit 3456 4.4 Stress-Energy Tensor 3459 4.5 Einstein’s Field Equations 3465 4.5.1 Cosmological Constant 3471 4.5.2 Schwarzschild Solution 3473 4.6 Experimental Tests 3486 4.6.1 The Precession of Mercury’s Perihelion 3486 4.6.2 Deflexion of Light 3497 4.6.3 Shapiro Effect (delay) 3502 4.6.4 Black Holes 3509 5 Cosmology 3514 5.1 Newtonian Cosmological Model 3514 5.1.1 Hubble’s Law 3516 5.2 Friedmann Equations 3519 5.2.0. 1 Critical Density 3523 5.3 Cosmological models of Friedmann-Lemaitre 3528 5.3.1 Flat spaces (k = 0) 3528 5.3. 1.1 Flat space dominated by matter 3529 5. 3. 1.2 Flat space dominated by radiation 3531 5.3.2 Spherical spaces (k > 0) 3533 5.3.2. 1 Spherical space dominated by matter 3539 5. 3. 2. 2 Spherical space dominated by radiation 3541 5.3.3 Hyperbolic spaces (kyO) 3544 5.3.4 Matter dominated hyperbolic space 3547 5.3.5 Hyperbolic space dominated by radiation 3550 xxxr 5.4 Observable Universe 3553 5.5 Cosmic Microwave Background (CMB) 3568 6 String Theory 3573 6.1 Wave equation of a transervsal string 3574 6.2 Non-relativistic Wave equation of a transversal string 3576 6.2.1 Nambu-Goto Action 3582 6.3 Lagrangian of a String 3589 12 Chemistry 3593 1 Quantum Chemistry 3594 1.1 Infinite three-dimensional rectangular potential 3594 1.2 Molecular Vibrations 3597 1 .3 Hydrogenoid Atom 3600 1 .4 Rigid Rotator 3604 1.4.1 Potential Profile 3631 2 Molecular Chemistry 3634 2.0. 1 Orbital Approximations 3635 2.1 LCAO Method 3639 2.2 Molecular Rotational Energy Levels 3648 2.3 Molecular Vibrational Energy Levels 3650 3 Analytical Chemistry 3652 3.1 Simple Mixtures 3653 3.2 Reactions 3655 4 Thermochemistry 3659 4.0. 1 Chemical transformations 3659 4.1 Molar Quantities 3661 4.1.1 Standard enthalpy of reaction 3668 4. 1 . 1 . 1 Kirchhoff ’s Enthalpy Law 3671 13 Theoretical Computing 3675 1 Numerical Methods/ Analysis 3677 1 . 1 Computer Representation of Numbers 3679 1.1.1 Decimal System 3679 1.1.2 Binary system 3680 1.1. 2.1 Binary arithemetics 3681 1.1.3 Hexadecimal System 3681 1.1.4 Octal System 3682 1.1.5 Conversion of decimal system to non-decimal system: . . . .3682 1.2 Algorithm Complexity 3685 1.2.1 NP-Completude 3690 1.3 Integer Part 3692 1 .4 Heron’s Square Root Algorithm 3694 1.5 Archimedes Algorithm 3696 1.6 Euler’s Number e 3698 1.7 Stirling’s factorial approximation 3698 1 .8 Linear System of Equations 3701 1.8.1 One equation with on unknown 3701 1.8.2 Two equations with two unknowns 3701 1.8.3 Three equations with three unknowns 3703 xxxii 1.8.4 n equations with n unknowns 3706 1.9 Polynomials 3707 1.10 Regression Techniques 3710 1.10.1 Univariate linear regression model 3713 1.10.1.1 Regression line 3714 1.10.1.2 Least Squares Method (LSM) 3715 1 . 10. 1 .3 Univariate Regression Variance Analysis 3718 1 . 10. 1 .4 F-test for Regression (significance test for linear re- gression) 3724 1 . 10.2 Univariate linear regression Gaussian Model 3732 1.10.2.1 Pearson Correlation Coefficient Test 3740 1.10.2.2 Confidence interval of predicted values 3742 1.10.3 Linear univariate regression forced through the origin . . . .3749 1.10.4 Deming regression (orthogonal regression) 3750 1. 10.5 Multiple linear regression Gaussian Model 3756 1.10.5.1 Variance Inflation Factor (multicolinearity) 3764 1.10.6 Polynomial regression 3769 1.10.7 Logistic Regressions (LOGIT) 3771 1.10.7.1 Binomial Logistic Regression 3771 1.10.7.2 ROC and Lift curves 3780 1.11 Interpolation Techniques 3795 1.11.1 Bezier Curves (B-Splines) 3795 1.11.2 Euler Method 3802 1.11.3 Polynomial of collocation 3804 1.11.4 Lagrange polynomial interpolation method 3 807 1.12 Roots search 3809 1.12.1 Proportional parts methods 3809 1.12.2 Bisection method 3811 1.12.3 Secant method (Regula Falsi or False Position) 3814 1.12.4 Newton’s method 3817 1.13 Numerical Differentiation 3822 1.14 Numerical Integration 3824 1.14.1 Rectangles method 3825 1.14.2 Trapezoidal method 3826 1.15 Optimization 3827 1.15.1 Linear programming (Linear Optimization) 3828 1.15.1.1 Graphical LP resolution 3831 1.15.1.2 Algebraic LP resolution 3832 1.15.1.3 Simplex algorithm LP resolution 3838 1.15.2 N onlinear programming (N onlinear optimization) 3844 1.15.2.1 Substitution Method 3847 1.15.2.2 Lagrange Multipliers Method 3848 1.15.2.3 Newton-Raphson Method (Quadratic Newton) . . .3852 1.15.2.4 Gauss-Newton Method (Tangent Newton) 3857 1.16 Resampling statistics 3865 1.16.1 Monte Carlo Simulations 3866 1.16.1.1 Inverse Transform Sampling 3868 1.16.1.2 Random number generation 3870 xxxiii 1.16.1.3 Monte Carlo integration 3874 1.16.1.4 Monte Carlo Estimation of Pi 3875 1.16.1.5 Monte Carlo Modeling 3877 1.16.2 Bootstrapping 3881 1.16.3 Jackknifing (jacknife resampling) 3887 1.17 Finite difference method (F.D.M.) 3890 1.17.1 One space dimension F.D.M 3890 1.17.1.1 von Neuman stability 3893 1.17.2 Space-time F.D.M (finite-volume method) 3896 1.18 Data Mining 3904 1.18.1 Clustering 3915 1.18.2 Regression and classification trees 3918 1.18.3 K - Means 3927 1.18.4 Hierarchical Ascendant Classification (HAC) Dendrograms .3934 1.18.5 Neural networks 3940 1.18.5.1 Neuron model 3942 1.18.5.2 Transfer functions 3946 1.18.5.3 Network Architecture 3947 1.18.6 Genetic Algorithms 3959 1.18.6.1 Encoding and Initial population 3963 1.18.6.2 Operators 3965 1.18.6.2.1 Operator of selection 3966 1.18.6.2.2 Crossover operator 3967 1.18.6.2.3 Mutation operator 3968 2 Fractals 3974 2.1 IFS Fractals 3976 2.1.1 Fractals Metric Space 3991 2.2 Fractals Visualization 3995 2.2.1 Cantor’s Fractal (Cantor Set) 3995 2.2.2 Triangle Sierpinski Fractal 3997 2.2.3 Sierpinski carpet fractal 4003 2.2.4 Fractal spirals 4007 2.2.5 Von Koch fractal (Koch snowflake) 4009 2.2.6 Natural fractals 4013 2.2.6. 1 Branch 4014 2.2.62 Snowflake 4016 2.2.63 Tree 4019 2. 2. 6.4 Fern 4021 2.3 Escape Time Algorithm Fractals 4024 2.3.1 Mandelbrot set 4026 2.3.2 Julia set 4030 2.3.3 Newton set 4034 3 Fogical Systems 4036 3.1 Strict Fogic 4036 3.1.1 Boolean Algebra 4037 3.1.2 Fogical Functions 4044 3.1.3 Karnaugh maps 4047 4 Error-Correcting Codes 4048 xxxrv 4.1 Checksum 4052 4.1.1 Luhn algorithm 4052 4.2 Check Digit 4054 4.2.1 European Article Numbering (EAN- 13) 4054 4.2.2 Swiss Post Payment slip 4056 4.2.3 International Bank Account Number (IBAN) 4058 4.2.4 UIC wagon numbers 4059 4.3 Permutations 4061 4.4 Encoders 4062 4.4.1 Block code 4066 4.4.2 Systematic codes 4073 5 Automata Theory 4076 5.1 Von Neumann machine 4078 5.2 Turing machine 4079 5.3 Chomsky hierarchy 4083 5.3.0. 1 Formal language 4083 6 Cryptography 4085 6.1 Cryptographic systems 4086 6.1.1 Kerckhoffs’ principle 4090 6.2 Traps 4091 6.3 Secret-key encryption system 4092 6.3.1 Feistel Schemes 4094 6.4 Public key encryption 4099 6.4.1 Diffie-Hellman protocol 4101 6.4.2 Elliptic Curves Cryptography 4102 6.4.2. 1 Plane curves 4102 6. 4. 2. 2 Plane curves of low degree 4103 6. 4. 2. 3 Rational points on the unit circle 4104 6. 4. 2. 4 Elliptic curves and Bezout’s theorem 4106 6. 4. 2. 5 The addition law on elliptic curves 4107 6. 4. 2. 6 Generating all the rational points 4108 6. 4. 2. 7 Beyond elliptic curves 4109 7 Quantum Computing 4110 7.1 Schrodinger’s Cat superposition 4112 7.2 Photon polarization 4113 7.3 Qubit 4116 7.3.1 Bloch sphere 4122 7. 3. 1.1 Qubit of polarization 4127 7. 3. 1.2 1/2 spin Qubit 4129 7.4 Quantum logic gates 4134 14 Social Sciences 4137 1 Population Dynamics 4139 1.1 Birth rate and mortality tables (biometric features) 4139 1.1.1 Population Renevewal 4148 1.2 Population Models 4150 1.2.1 Exponential model 4150 1.2.2 Deterministic Logistic Model (Verlhust) 4153 xxxv 1.2.3 Chaotic Logistic Model 4157 1.2. 3.1 Feigenbaum’s Bifurcation Diagram 4161 1.2.4 Malthusian Growth Law 4165 1.2.5 Leslie model 4166 1.2.6 SIR Model for Spread of Disease 4168 1.2.7 Lotka- Volterra predator-prey model 4171 1.3 Schaefer’s Optimal capture model 4180 1.4 Hardy-Weinberg model 4182 1.5 Mendel’s law 4188 1.6 Growth rate with temperature 4189 2 Game and Decision Theory 4190 2.1 Behavorial decision bias (cognitive bias) 4194 2.1.1 Sunk Cost 4196 2.1.2 Anchoring Bias 4197 2.2 Utility 4199 2.2.1 Pareto Optimum 4199 2.2.2 Nash Equilibrium 4200 2.3 Games Representations 4203 2.3.1 Extensive representation of a Game 4205 2.3.2 Extensive representation of a Decision 4206 2.3.2. 1 Real Options 4213 2.3.3 Normal representation of a Game 4215 2.3.3. 1 Repetitive Games 4222 2.3.4 Set representation of a Game 4224 2.3.5 Graphical representation of a Game 4227 2.4 Expected Utility 4231 2.4.1 Hurwitz Criteria 4231 2.4.2 Laplace Criteria 4234 2.5 Evolutionary Game 4235 2.5.1 Dove % Hawk game in pure strategy (without probabilities) . 4238 2.5.2 Dove % Hawk game in stable evolutionary strategy (with probabilities) 4240 2.6 Cournot Competition 4242 2.7 Markov Decision Processes (MDPs) 4246 2.8 Multi-Criteria Decision Making (MCDM) 4252 2.8.1 Analytic Hierarchy Process (AHP) 4252 3 Economy 4263 3.1 Concepts 4263 3.1.1 Microeconomics 4264 3 . 1 . 1 . 1 Average & Marginal Cost/Revenue 4271 3.1.2 Macroeconomics 4276 3 . 1 .2. 1 Cobb-Douglas Model 4277 3.2 Monetary Model 4283 3.2.1 Walras’ law 4285 3.3 Price Index and GDP 4290 3.3.1 Paasche and Laspeyres price indices 4291 3.3.2 Fisher index and Marshall-Edgeworth index 4292 3.3.3 Gross domestic product (GDP) 4293 xxxvr 3.4 Supply and Demand Theory 4295 3.4.1 Expected utility theory 4295 3.5 Net gain/loss opposite feedback model 4301 3.6 Capitalization and Actuarial 4309 3.6.1 Dates Intervals 4311 3.6.2 Rates Equivalence 4314 3.6.3 Simple Interest 4315 3.6.3. 1 Discounts 4316 3.6.4 Compound Interest 4318 3.6.5 Continuous Interest 4320 3.7 Progressive interest (annuities) 4321 3.7.1 Postnumerando annuities 4322 3.7.2 Praenumerando annuities 4325 3.8 Rounding 4327 3.9 Loans Amortization/Repayments 4330 3.9.1 Fixed-Term Loan 4332 3.9.2 Loan with constant amortization 4333 3.9.3 Loan with constant annuity 4333 3.10 Modern Portfolio Theory 4336 3.10.1 No Arbitrage Opportunity (N . A . O . ) 4339 3.10.2 Portfolios 4344 3.10.2.1 Stocks (shares of stocks )/Equities 4352 3.10.2.1.1 Dividend Yield 4354 3.10.2.1.2 Shares Benchmark Indices 4357 3.10.2.1.3 Durand Model 4358 3.10.2.2 Obligations (Bonds) 4362 3.10.2.3 Warrants 4375 3.10.2.4 Futures & Forwards 4379 3.10.2.4.1 Futures & Forwards naive pricing 4384 3.10.2.4.2 Futures & Forwards commodity hedging . .4388 3.10.2.4.3 Options 4394 3.10.2.5 Returns and Investments rates 4406 3.10.2.5.1 Return on Investment 4407 3.10.2.5.2 Internal Rate of Return 4408 3.10.2.5.3 Money Weighted Rate of Return (M.W.R.R.) 4409 3.10.2.5.4 Time Weighted Rate of Return (M.W.R.R.) .4412 3.10.2.6 Theory of Speculation 4414 3.10.2.7 Portfolio efficient diversification models 4423 3.10.2.7.1 Overall minimum variance portfolio (Markowitz portfolio) 4424 3.10.2.7.2 Overall minimum Sharpe portfolio 4435 3.10.2.8 Capital Asset Pricing Model (CAPM) 4453 3.10.2.9 Black & Scholes option pricing models 4468 3.10.2.9.1 Put-Call parity equation 4469 3.10.2.9.2 Efficient Market Hypothesis 4472 3.10.2.9.3 Wiener Process 4473 3.10.2.9.4 Generalized Brownian motion 4477 3.10.2.9.5 Brownian bridge 4481 xxxvii 3.10.2.9.6 Ito process 4484 3.10.2.9.7 Black & Scholes Equation 4497 3.10.2.9.8 Self-financing portfolio on underlying . . . .4500 3.10.2.9.9 Greeks and others 4501 3.10.2.9.10 Solving the Black & Scholes equation . . . .4505 3.10.2.10 Binomial Pricing (CRR model) 4517 3.10.2.11 VIX volatility index 4531 3.10.2.12 Value at Risk 4538 3.10.2.12.1 Relative Value at Risk 4539 3.10.2.12.2 Absolute Value at Risk 4544 3.10.2.12.3 Delta-Normal Value at Risk 4546 3.10.2.12.4 Historical Value at Risk 4547 3.10.2.12.5 Credit Value at Risk 4547 3.10.2.12.6 Operational Value at Risk 4549 3.10.2.12.7 Variance-Covariance Value at Risk 4551 3.10.2.12.8 Variance-Covariance Value at Risk 4552 3.10.2.12.9 Back-Testing Value at Risk 4554 3.11 Time Series Analysis 4556 3.11.1 Type of Errors 4561 3.11.2 Decompositions 4562 3.11.3 Types of Forecasting Models 4568 3.11.3.1 S imple Moving Average (moving average smoothing)!5 7 1 3.11 .3.2 Linear Model With Seasonal Coefficients (LMSC) . 4574 3.11 .3.3 Simple Exponential Smoothing (EWMA) 4579 3.11.3.4 Double Exponential Smoothing with One Parameter (Brown Method) 4588 3.11.3.5 Holt’s Double Exponential Smoothing with 2 Pa- rameters (Additive Method) 4596 3.11.3.6 Holt’s and Winter Triple Exponential Smoothing with 3 Parameters (Multiplicative Method) 4601 3.11.3.7 Logistic Model 4606 3.1 1.4 Autoregressive Models 4613 3. 1 1 .4. 1 AR(p) Autoregressive processes 4622 3.11 .4.2 MA(g) Moving Average stationary process 4624 3.11.4.3 ARMA(p, q) Autoregressive non-seasonal moving average processes 4626 3.11.4.4 ARIMA(p, d, q) Autoregressive non-seasonal inte- grated moving average processes 4627 3.11.5 Durbin- Watson autocorrelation test 463 1 Quantitative Management 4634 4.1 Corporate Finance Management 4637 4.1.1 Basic Accounting Equation 4638 4.1.2 Ratio Analysis 4639 4. 1 .2. 1 Short term solvency or liquidity measure 4640 4. 1.2. 2 Long term solvency or Liquidity measure 4641 4. 1.2. 3 Profability Measures 4642 4. 1.2.4 Growth Rate 4643 4. 1 .2.5 Asset management or turnovers measures 4644 xxxviii 4. 1.2. 6 Market Value Measures 4644 4.1.3 Weighted Average Cost of Capital (WACC) 4646 4.1.4 Break-even Point Analysis (BEPA) 4649 4.1.5 Investment Strategies 4651 4. 1 .5. 1 Net Present Value 4652 4. 1 .5.2 Internal Rate of Return 4657 4. 1 .5.3 Internal Rate of Return 4659 4.1.6 Company Valuation Methods 4659 4. 1.6.1 Balance sheets-based method 4661 4.1. 6.1.1 Book Value 4661 4. 1.6. 1.2 Adjusted Book Value 4662 4. 1.6. 1.3 Liquidation Value 4663 4. 1.6. 1.4 Substantial Value 4663 4. 1.6. 1.5 Book Value and Market Value 4664 4. 1.6. 2 Income Statemented-Based Methods 4664 4. 1 .6.2. 1 Value of Earnings with PER 4665 4. 1.6. 2. 2 Value of Dividends 4666 4. 1.6. 2. 3 Sales Multiples 4667 4. 1.6. 3 Goodwill-Based Methods 4667 4.1.7 Capital Goods 4669 4. 1 .7. 1 Linear Amortization 4669 4. 1.7. 2 Arithmetic Declining Amortization 4670 4. 1 .7.3 Geometric Declining Amortization (declining balanco)67 1 4.1.8 Wages model 4672 4.2 Project Management 4674 4.2.1 Probabilistic PERT 4674 4.2.2 Project planning variance reduction 4683 4.2.3 Process Reliability 4685 4.3 Lean Management (Six Sigma Process) 4687 4.3.1 Pareto Analysis 4690 4. 3. 1.1 Gini Index 4694 4.3.2 Weighted Ishikawa Diagram 4697 4.4 Supply Chain Management 4701 4.4.1 Supply Chain Management in uncertain future 4703 4.4.2 Optimal initial stock management with zero rotation 4705 4.4.3 Wilson’s Models 4712 4.4.3. 1 Wilson’s model with resupply 4717 4. 4. 3. 2 Wilson’s model without resupply 4726 4. 4. 3. 3 Wilson’s model with resupply and break-up 4728 4.5 Queueing Theory 4733 4.5.1 M/M/ . . . arrival times modelisation 4737 4.5.2 M/M/ . . . service times modelisation 4742 4.5.3 Kendall queues notation 4744 4.5.4 Modeling of arrivals and departures M/M/1 4746 4.5.5 Probability of standby in &M /M /k/k queue (Erlang-B formula)753 4.5.6 Probability M/M/ K / + oo of standby (Erlang-C formula) . . 4756 4.6 Insurance 4761 4.6.1 Premium pricing 4763 xxxix 4.7 Sensitivity Analysis 4770 4.7.1 Direct Bias Method 4772 4.7.2 Correlation Method 4776 5 Music Maths (physics of hearing) 4781 5.1 Longitudinal Sound Waves 4781 5.1.1 Power carried by a sound wave 4788 5.1.2 Measuring the intensity of a sound 4792 5.2 Spherical Sound Waves 4794 5.3 Doppler effect 4795 5.3.1 Fixed source-Moving observer 4796 5.3.2 Moving source-Fixed observer 4797 5.3.3 Moving source and observer 4799 5.4 Shockwaves 4800 5.5 Music Scales 4802 5.6 Harmonic Oscillator 4805 5.6.1 Damped oscillator 4807 15 Engineering 4809 1 Marine & Weather Engineering 4811 1 . 1 Visual horizon 4813 1.2 Wind direction 4817 1.3 Atmospheric Profile Models 4820 1.3.1 Atmospheric Exponential Profile Model 4820 1.3.2 Adiabatic Atmosphere Model 4824 1.3. 2.1 Hypsometric equation 4826 1 .4 Planetary equilibrium temperature 4828 1 .4. 1 Greenhouse effect 4830 1.4.2 Milankovitch cycles 4831 1.5 Weather (sounding) balloon 4832 1.6 Cyclogenesis and Anticyclogenesis 4836 1.7 Tides 4845 1.7.1 First approach 4846 1.7.2 Second approach 4849 1.8 Lorenz equation 4856 1.8.1 Rayleigh-Benard convection cells (Benard-Marangoni insta- bility) 4866 1.8.2 Lorenz attractor and chaos 4874 1.9 Waves 4881 1 .9. 1 Depth of a wave 4892 1.9.2 Wave’s amplitude 4893 2 Mechanical Engineering 4895 2.1 Gears 4895 2.1.1 Transmission ratios 4899 2.1.2 Gears association 4902 2. 1.2.1 Odd/Even Gear "problem" 4906 2.1.3 Type of Gears 4908 2.2 Strength of materials 4913 2.2.1 Quadratic moments 4916 xl 2.2.2 Equation of the elastic line 4918 2.2.2. 1 Euler-Bernoulli Beam equation 4925 2. 2. 2. 2 Potential elastic energy 4931 2.2.3 Torsion 4933 2.2.4 Buckling 4937 2.2.5 Traction 4942 3 Electrical Engineering 4943 3.1 Elementary Primitive Electrical Symbols 4944 3.2 Alternative current VS Direct current 4951 3.2.1 Average power 4953 3.3 Transformers 4957 3.3.1 Transformer universal EMF equation 4963 3.4 Steady State linear circuits 4964 3.4.1 RC series circuit 4964 3.4.2 RL series circuit 4967 3.4.3 RLC circuit 4970 3.4.3. 1 Critically damped response 4972 3. 4. 3. 2 Overdamped response (hypercritic) 4973 3. 4. 3. 3 Underdamped response (decaying oscillation) . . .4975 3.5 Linear circuit in forced regime 4980 3.5.1 Low-pass filter 4982 3.5.2 High-pass filter 4984 3.5.3 Integrator and differentiator 4985 4 Civil Engineering 4987 4.1 Static 4988 4.2 Pulleys 4989 4.2.1 Windlass 4995 4.3 Cornu spiral 4997 4.4 Overhead cable 5002 4.4.1 Free overhead cable (catenary) 5003 4.4.2 Charges overhead cable (suspended bridge) 5010 4.4.3 Very tense cable 5013 4.5 Falling chimney (naive approach) 5015 4.6 Dams 5021 5 Aerospace Engineering 5024 5.1 Airfoil Lift 5025 5.1.1 Newton’s lift argument (skipping stone argument) 5027 5.1.2 Bernoulli’s lift argument (equal time argument) 5028 5.1.3 Euler’s lift argument 5029 5.1.4 Coanda lift argument 5030 5.1.5 Kutta-Joukowski lift argument 5030 5.2 Cosmic speeds 5031 5.3 Fundamental Equation of Propulsion (Tsiolkovsky rocket equation) . . 5033 5.4 Geostationary orbit 5040 5.5 Vis- Viva Equation 5044 5.6 Hohmann Transfer orbit 5046 6 Software Engineering 5048 6.1 Algorithm 5048 xli 6.2 Dichotomic Search algorithm 5049 6.2.1 Bisection algorithm 5049 6.2.2 Binary search algorithm 5052 6.3 Tower of Hanoi algorithm 5054 6.4 Sorting Algorithms 5060 6.4.1 Bubble sort 5060 6.4.2 Quicksort algorithm 5062 6.5 Dijkstra’s algorithm 5064 6.6 Google PageRank algorithm 5070 6.6.1 Weighted Count 5072 6.6.2 Recursive counting 5073 6.6.3 Absorbing states 5078 7 Industrial Engineering 5081 7.1 Six Sigma 5082 7.1.1 Quality Control 5084 7.1.2 Defaults/Errors 5085 7.1.3 Capability Indices 5090 7.1.4 Quality Levels 5103 7.2 Taguchi Model 5116 7.3 Preventive Maintenance 5122 7.3.1 Planned Obsolescence 5123 7.3.2 Reliability Empirical Estimators 5125 7.3.2. 1 Average Failure Rate 5140 7.3.3 Weibull Distribution 5141 7.3.3. 1 Two-parameter Weibull distribution linearization . .5146 7.3.4 Topology of Systems 5148 7.3.4. 1 Fault Tree Analysis 5160 7. 3.4. 2 Markov Chain Reliability Model 5162 7.3.5 Maximum Likelihood for failure rate determination of samples5166 7.3.6 Kaplan-Meier S urvi val Rate 5168 7.3.7 ABC Method 5173 7.4 Design of Experiments (DoE) 5178 7.4. 1 Two levels factorial Designs 5188 7.4. 1 . 1 Replicated full factorial designs 5195 7. 4. 1.2 Plackett-Burman Designs 5201 7. 4. 1.3 Fractional Factorial Designs 5206 7.4.2 General factorial Designs 5216 7.4.3 Taguchi Designs and Nomenclature (robust designs) 5230 7.4.4 Response Surface Methodology (Box Domains) 5239 7.4.4. 1 Pure quadratic curvature test 5241 7. 4.4. 2 Box- Wilson Central Composite Designs 5243 7.4. 4. 2.1 Circumscribed Center Designs 5246 7.4. 4. 2. 2 Face Centered Designs 5250 7.4.5 Optimal Designs 5254 7.4.6 Mixture Design 5263 7.4.6. 1 Network mixture designs (simplex lattice designs) . 5266 7. 4. 6. 2 Full Factorial Combined with Mixture Design- Crossed Design 5282 xlii 7.4.7 General DoE diagnostic tools 5285 7.4.7. 1 Lenth’s PSE Pareto Margin Error for unreplicated factorial designs 5285 7. 4.7. 2 Pareto Margin Error for replicated factorial designs . 5288 7. 4. 7. 3 Desirability 5289 7.5 Quality Control on Reception (Lot Acceptance Sampling Plans) .... 5293 7.5.1 Simple acceptance sampling plan by measurement for a unique tolerance with known standard deviation 5296 7. 5. 1.1 Calculation of the parameters using the norms AF- X06-023 5304 7.5.2 Simple acceptance sampling plan by attribute 5307 7.5.2. 1 Calculation of the parameters using the norm ISO 2859-1 5313 7.5.3 Double acceptance sampling plan by attribute 5315 7.5.4 Operating characteristic curve (OC) 5316 7.5.5 Average outgoing quality (AOQ) 5319 7.6 Quality Control Charts (CC) 5323 7.6.1 WECO’s empirical rules 5327 7.6.2 Sample size and Sampling frequency for Control Charts . . . 5329 7.6.3 Attributes Control Charts (qualitative CC) 5331 7.6.3. 1 P Control Charts (binomial proportion CC) 5333 1.63.2 NP Control Charts (binomial counting CC) . . . .5337 1.633 C Control Charts (Poisson counting CC) 5340 7. 6. 3. 4 U Control Charts (normalized Poisson) 5344 1.63.5 Laney’s p' and u' control charts 5348 7.6.4 Measurement Control Charts (quantitative CC) 5350 7.6.4. 1 Individual measurement control chart with required limits 5351 7. 6.4. 2 Individual measurement control chart with moving limits 5352 7. 6.4. 3 Subgroups measurement control chart with standard error 5358 7. 6.4. 4 S — S Subgroups measurement control chart for standard deviation 5363 7. 6.4. 5 X — S Subgroups measurement control chart . . . .5371 7. 6.4. 6 X — Sp Subgroups measurement control chart with pooled variance 5375 7. 6.4. 7 R — R Subgroups measurement control chart .... 5380 7.6.5 Autocorrelated Measurement Control Charts (time weighted control charts) 5390 7.6.5. 1 I — MR/X Individual moving range measurement control chart 5390 7. 6. 5. 2 I — MR/MR Individual moving range measurement control chart 5394 1.6.53 Individual Moving Average control chart 5397 7. 6. 5. 4 CUSUM (cumulated sum) control chart with empir- ical V-mask 5402 xliii 7. 6. 5. 5 EWMA control charts (exponential weighted mov- ing average) with fixed limits 5410 7.6.6 Rare events control charts 5417 7.6.6. 1 Frequency T control chart with probabilistic limits .5418 7. 6. 6. 2 Frequency G control chart of rare events 5422 7.6.7 Control Charts Operating Characteristic (OC) Curves 5427 7.6.7. 1 OC for X measurement control charts 5427 7. 6. 7. 2 OC for P-type attribute control charts 5429 7.7 Design of reliability tests 5431 7.7.1 Chi- squared time of test 5432 7.7.2 Binomial sampling size 5434 7.7.3 Beta-binomial sampling size 5435 16 Epilogue 5439 17 Biographies 5441 A 5442 B 5444 C 5451 D 5459 E 5463 F 5466 G 5470 H 5475 I 5483 J 5483 K 5486 F 5489 M 5497 N 5504 0 5508 P 5509 R 5514 S 5516 T 5523 V 5526 W 5528 Y 5531 Z 5533 18 Chronology 5534 19 Humour 5565 1 Situations 5566 2 Mathematics 5579 3 Physics 5597 4 Statistics 5613 5 Chemistry 5616 6 Engineering 5622 xliv 7 Computing 5630 8 Social Sciences 5639 20 Links 5645 1 Exact Sciences 5646 2 Publishing/Magazines 5647 3 Associations 5649 4 Jobs 5650 5 Television/Radio 5650 6 Other sciences 5651 7 Softwares/Applications 5652 21 Quotes 5655 22 Change Log 5662 23 Nomenclature 5680 List of Figures 5684 List of Tables 5719 List of Algorithms 5725 Bibliography 5727 Index 5733 24 Donate 5783 xlv Dedicated to Mother Nature Warnings Contents 1 Impressum 3 1.1 Use of content 3 1.2 How to use this book 4 1.3 Data Protection 7 1.4 Use of data 7 1.5 Data transmission 7 1.6 Agreement 7 1.7 Errata 7 2 License 8 2. 1 Preamble 8 2.2 Applicability and Definitions 9 2.3 Verbatim Copying 10 2.4 Copying in Quantity 10 2.5 Modifications 11 2.6 Combining Documents 12 2.7 Collections of Documents 13 2.8 Aggregation with independant Works 13 2.9 Translation 13 2.10 Termination 13 2. 1 1 Future revisions of this License 14 3 Roadmap 15 2 Impressum 1.1 Use of content The contents of this book are elaborated by a development process by which volunteers reach a consensus. This process that brings together volunteers, research also the point of view of people interested in the topics of this book. The person in charge of this book administers the process and establishes rules to promote fairness in the consensus approach. It is also responsible for drafting the text, sometime for testing/evaluating or independently verifying the accuracy or completeness of the presented information. We decline no responsibility for any injury, damage or any other kind, special, incidental, con- sequential or compensatory, arising from the publication, application or reliance on the content of this book. We make no express or implied warranty on the accuracy or completeness of any information published in this book, and do not guarantee that the information contained in this book meet any specific need or goal of the reader. We do not guarantee the performance of products or services of one manufacturer or vendor solely by virtue of this book content. The technical descriptions, procedures, and computer programs in this book have been devel- oped without care, therefore they are provide without warranty of any kind. We make also no warranties that the equations, programs, and procedures in this books or its associated software are free of error, or are consistent with any particular standard of merchantability, or will meet your requirements for any particular application. They should not be relied upon for solving a problem whose incorrect solution could result in injury to a person or loss of property. Any use of the content of this book as at the reader’s own risk. The authors, redactors, and publisher dis- claim all liability for direct, incidental, or consequent damages resulting form use of the content of this book or the associated software. By publishing texts, it is not the intention of this book to provide services on behalf of any person or entity or performing any task to be accomplished by any person or entity for the benefit of a third party. Anyone using this book should rely on its own independent judgment or, where that is appropriate, seek the advice of a qualified expert to determine how to exercise reasonable care under all circumstances. The information and standards on the topics covered by this book may be available from other sources that the reader may wish to visit in search of points of view or additional information not covered by the contents of this book. We have no power in order to enforce compliance with the contents of this book, and we do not undertake to monitor or enforce such compliance. We have no certification, testing or inspection activity of products, designs or installations for safety or health of persons and property. Any certification or other statement of compliance regarding information relating to health or safety of persons and property, mentioned in this book, cannot possibly be attributed to the content of this book and remains under the responsibility of the certification center or the concerned reporter. 1. Warnings EAME v3. 5-2013 1.2 How to use this book At the university level, this book can be used for a Ph.D., graduate level or advanced under- graduate level seminar in many exact and pure sciences fields. The seminars where we use this material is part of Scientific Evolution Sari program, where the trainees typically already have taken undergraduate or graduate courses in their respective specialization. In reality this books also aims to cover the full Kindergarten to PhD curriculum. Because the methods of Applied Mathematics are learned by practice and experience, we view a seminar on Applied Mathematics as a learning-by-doing (project oriented) seminar. We struc- ture our mathematical modelling seminars around a set of problems that require the trainee to construct models that help with planning and decision making. The imperative is that the models should be consistent with the theory and back-tested. To fulfill this imperative, it is necessary for the trainee to combine mathematical theory with modeling. The result is that the trainee learns the theory, and more importantly, learns how that theory is applied and combined in the real world. The ability to criticize and identify limitations of dangerous mathematical tools is the most valuable feature of our seminars. The problems with solutions in this book provide the opportunity to apply the text material to a comprehensive set of fairly realistic situations. By the end of the seminars the trainees will have enhanced their skills and knowledge of the most important theoretical and computing tools. These are valuable skills that are in demand by the businesses at the highest levels. It is very difficult to cover all the material in this book in a semester. It takes a lot of time to explain the concepts to the trainees. The reader is encouraged to pick and choose which topics will be covered during the term. It is not necessary strictly necessary to cover them in sequence but it can help in a significant way? In a nutshell, this book offers you a wide variety of topics that are amenable to modeling. All are practical. 1.2.1 Ancilliaries We offer an array of ancilliaries for students, instructors and practitioners. First there are some free companion eBooks and tools in French and English written by Vincent ISOZ & Daname KOLANI for the people that want to put in practice the theory presented in this book. Here is the list: • MATLAB™ in English (1,339 pages): http://www.sciences.ch/dwnldbl/divers/Matlab.pdf • Maple in French (99 pages): http://www.sciences.ch/dwnldbl/divers/Maple.pdf • R in French (1,626 pages): 4/5785 info @ sciences. ch EAME v3. 5-2013 1. Warnings http ://www. sciences .ch/dwnldbl/divers/R.pdf • Minitab in French (1,092 pages): http://www.sciences.ch/dwnldbl/divers/Minitab.pdf • Scientific Linux installation & Configuration (211 pages): http://www.sciences.ch/dwnldbl/divers/ScientificLinux.pdf In second we offer a few Quizzes and Flashcards in French and English to challenge your students or just yourself with the rest of the world: • MATLAB™ Basics LI Challenge level in French (100 questions) http://www.scientific-evolution.com/qcm/ start_session/a73647cf 3b/ • Astronomy /Astrophysics HI Challenge level in English (100 questions): http://www.scientific-evolution.com/qcm/ start_session/ffd0810f aO/ • Greek Letter Flashcards (48 cards): http: / /www . scientific- evolution. com/qcm/f r/start_session/6d9f If ef 90/ • Common Derivatives Flashcards (29 cards): http://www.scientific-evolution.com/qcm/fr/ start_session/ cl5a40f 2c4/ • Common Primitives Flashcards (60 cards): http://www.scientific-evolution.com/qcm/fr/ start.session/ ccf c20fdef / • Common Trigonometric Identities Flashcards (68 cards): http://www.scientific-evolution.com/qcm/fr/ start_session/882f 9696cd/ • DTpX L3 Challenge level in French (100 questions): http: / /www . scientific- evolution. com/qcm/f r/ start _session/f fie ldlb91/ • R Software 3.1.2 L3 Challenge level in French (100 questions): http://www.scientific-evolution.com/qcm/fr/ start_session/2a6f ca7473/ • C++ L3 Challenge level in French (100 questions): http://www.scientific-evolution.com/qcm/fr/ start_session/e031ce4b43/ And as any technical book should have a forum, the reader ca go through this link for any discussions about the content of this book: info @ sciences. ch 5/5785 1. Warnings EAME v3. 5-2013 https : //www . physicsf orums . com For those who prefer social networks we have also a dedicated Facebook group: f https://www.facebook.com/groups/opera.magistris Or for more fun (science pics, quotes, jokes, videos, etc.) there is also an associated Instagram account: © http s : //www. instagram. com/opera. magi stri s/ And a collection of a selection of what we consider a interesting scientific videos on our YouTube channel: B https://www.youtube.com/user/AdminSciences As for this book, the companion books above are only samples of the complete one. The full version with perpetual free updates are available for the price of $ 299.- each and for $ 499.- you get the exercise files and ETgX sources (for information on purchase you can simply send me an email). Because this book mainly focus on mathematical aspect of physical phenomena we can only strongly recommend to the reader an another free book that is in our point of view actually the best one that focus on the popular science aspect of the subjects that we will cover: Motion Mountain by Dr. Christoph Schiller: http : / /www .motionmountain.net 6/5785 info @ sciences. eh EAME v3. 5-2013 1. Warnings 1.3 Data Protection When looking at information on the Internet companion site (Sciences.ch), some data are au- tomatically saved. We try to save as less as possible data and as brief as possible. Wherever we can, we ave only anonymous data. We undertake to process the data you send us personally with the utmost diligence. However, your IP address and the source page that takes you on Sciences.ch and the associated keywords, are freely available to everybody here for the current month. After which detailed data are destroyed. You can object at any time in the publication of your data by contacting us. 1.4 Use of data Your data are only used for sending the Sciences.ch newsletter. Communication of personal data (except the e-mail address, title and name) is optional. When registering for the newsletter, you can of course specify an alternate address and/or a fictitious name. 1.5 Data transmission We will never sell or commercialize the data of our customers or interested parties and will never affects the rights of the person. In addition, we will not rent mailing lists and will not send you advertising from third parties or on our behalf. 1.6 Agreement When you provide us personal information, you authorize us to save them and use them within the meaning of the Swiss Federal Law on Data Protection. If you ask us not to send you emails, we are obliged, in your interest, save your e-mail in an internal negative list. 1.7 Errata Altought we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in this boo - maybe a mistake in the text, scripts or illustrations - we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. Our e-mail is given on the footer every page of this book. Once your errata are verified, your submissions will be accepted and the error will be visible on the change log of update versions. info @ sciences.ch 7/5785 License The entire contents of this book is subject to the GNU Free Documentation License, which means: • that everyone has the right to freely use the texts for non-commercial usage (Google Ads or any equivalent being considered as a commercial usage!) • that any person is authorized to broadcast items for non-commercial usage (Google Ads or any equivalent being considered as a commercial usage!) • that anyone can freely edit the texts for non-commmercial usage (Google Ads or any equivalent being considered as a commercial usage!) and bla bla bla... in accordance with the license described below: Version 1.1, March 2000 Copyright (C) 2000 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 0211 1-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. 2.1 Preamble The purpose of this License is to make a manual, textbook, or other written document "free" in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it only a non-commercial purpose. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others. This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software. We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference. EAME v3. 5-2013 1. Warnings 2.2 Applicability and Definitions This License applies to any manual or other work that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. The "Document", below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as "you". A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language. A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document’s overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (For example, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them. The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License. The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License. A "Transparent" copy of the Document means a machine-readable copy, represented in a for- mat whose specification is available to the general public, whose contents can be viewed and edited directly and straightforwardly with generic text editors or (for images composed of pix- els) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suit- able for input to text formatters. A copy made in an otherwise Transparent file format whose markup has been designed to thwart or discourage subsequent modification by readers is not Transparent. A copy that is not "Transparent" is named "Opaque". Examples of suitable formats for Transparent copies include plain ASCII without markup, Tex- info input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML designed for human modification. Opaque formats include PostScript, PDF, proprietary formats that can be read and edited only by proprietary word pro- cessors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine- generated HTML produced by some word processors for output purposes only. The "Title Page" means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, "Title Page" means the text near the most prominent appearance of the work’s title, preceding the beginning of the body of the text. info @ sciences. ch 9/5785 1. Warnings EAME v3. 5-2013 2.3 Verbatim Copying You may copy and distribute the Document in any medium, noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3. You may also lend copies, under the same conditions stated above, and you may publicly display copies. 2.4 Copying in Quantity If you publish printed copies of the Document numbering more than 100, and the Document’s license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Lront-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects. If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages. If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a public ly-accessible computer-network location containing a complete Transparent copy of the Document, free of added material, which the general network- using public has access to download anonymously at no charge using public-standard network protocols. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public. It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document. 10/5785 info @ sciences. ch EAME v3. 5-2013 1. Warnings 2.5 Modifications You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version: • Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous versions (which should, if there were any, be listed in the History section of the Document). You may use the same title as a previous version if the original publisher of that version gives permission. • List on the Title Page, as authors, one or more persons or entities responsible for au- thorship of the modifications in the Modified Version, together with at least five of the principal authors of the Document (all of its principal authors, if it has less than five). • State on the Title page the name of the publisher of the Modified Version, as the publisher. • Preserve all the copyright notices of the Document. • Add an appropriate copyright notice for your modifications adjacent to the other copyright notices. • Include, immediately after the copyright notices, a license notice giving the public per- mission to use the Modified Version under the terms of this License, in the form shown in the Addendum below. • Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document’s license notice. • Include an unaltered copy of this License. • Preserve the section entitled "History", and its title, and add to it an item stating at least the title, year, new authors, and publisher of the Modified Version as given on the Title Page. If there is no section entitled "History" in the Document, create one stating the title, year, authors, and publisher of the Document as given on its Title Page, then add an item describing the Modified Version as stated in the previous sentence. • Preserve the network location, if any, given in the Document for public access to a Trans- parent copy of the Document, and likewise the network locations given in the Document for previous versions it was based on. These may be placed in the "History" section. You may omit a network location for a work that was published at least four years before the Document itself, or if the original publisher of the version it refers to gives permission. • In any section entitled "Acknowledgements" or "Dedications", preserve the section’s title, and preserve in the section all the substance and tone of each of the contributor acknowl- edgements and/or dedications given therein. • Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the equivalent are not considered part of the section titles. info @ sciences. ch 11/5785 1. Warnings EAME v3. 5-2013 • Delete any section entitled "Endorsements". Such a section may not be included in the Modified Version. • Do not retitle any existing section as "Endorsements" or to conflict in title with any In- variant Section. • If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version’s license notice. These titles must be distinct from any other section titles. • You may add a section entitled "Endorsements", provided it contains nothing but endorse- ments of your Modified Version by various parties-for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard. • You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one. • The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version. 2.6 Combining Documents You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combi- nation all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice. The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work. In the combination, you must combine any sections entitled "History" in the various original documents, forming one section entitled "History"; likewise combine any sections entitled "Ac- knowledgements", and any sections entitled "Dedications". You must delete all sections entitled "Endorsements." 12/5785 info @ sciences. ch EAME v3. 5-2013 1. Warnings 2.7 Collections of Documents You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects. You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document. 2.8 Aggregation with independant Works A compilation of the Document or its derivatives with other separate and independent docu- ments or works, in or on a volume of a storage or distribution medium, does not as a whole count as a Modified Version of the Document, provided no compilation copyright is claimed for the compilation. Such a compilation is named an "aggregate", and this License does not apply to the other self-contained works thus compiled with the Document, on account of their being thus compiled, if they are not themselves derivative works of the Document. If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one quarter of the entire aggregate, the Document’s Cover Texts may be placed on covers that surround only the Document within the aggregate. Otherwise they must appear on covers around the whole aggregate. 2.9 Translation Translation is considered a kind of modification, so you may distribute translations of the Doc- ument under the terms of the corresponding section about transformation. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License provided that you also include the original English version of this License. In case of a disagreement between the translation and the original English version of this License, the original English version will prevail. 2.10 Termination You may not copy, modify, sublicense, or distribute the Document except as expressly provided for under this License. Any other attempt to copy, modify, sublicense or distribute the Docu- ment is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. info @ sciences. ch 13/5785 1. Warnings EAME v3. 5-2013 2.11 Future revisions of this License The Free Software Foundation may publish new, revised versions of the GNU Free Documenta- tion Ficense from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See http://www.gnu.org/copyleft/. Each version of the Ficense is given a distinguishing version number. If the Document speci- fies that a particular numbered version of this Ficense "or any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this Ficense, you may choose any version ever published (not as a draft) by the Free Software Foundation. Please consider the environment before printing 14/5785 info @ sciences. ch Roadmap This book has a simple progression rule that is: 1 new A4 page by day since May 2001 on subjects that interest the supervisor of the Sciences. ch distribution of the book Opera Magistris. The following subjects below are already planned for a near of far future still with the same level of details and pedagogical approach in the proofs: • Probabilites: - Baysian conjugation for Normal and Binomial law • Statistics: - Mode and Median of statistical laws - Semi-variance - Partial and semi-partial correlation - M-Estimators for localization and for dispersion - Likelihood of censored data - Jensen Inequality - Normal Law Entropy - Maximum likelihood Test - Propension score - Equivalence test - Quasi-correlation matrix - Factorial Analysis - Hotelling T-Test - Welch Test with Welch-Satterhwaite equation - ANCOVA - Wald-Wolfowitz Test (binary sequence) - Levene-Wolfwitz Test 1 (continuous up/down sequence) - Odds Ratio and its confidence interval - Risk Ratio and its confidence interval - Ellipse of control - Poisson Model for the average (2D) spatial distance - Canonical Correlation 'also named "turning point test" or "trend test" 1. Warnings EAME v3. 5-2013 - Intraclass correlation coefficient (ICC) - G-test of periodicity - Gaussian and Student copula - Hierarchical Fixed Factor ANOVA - Square Latin ANOVA without replication - Introduction to M ANOVA - Extreme Values Theorem - Survey Theory • Sequences and Series: - Properties of Fourier transforms - Laplace Transform - Z transform (common Z transforms, inverse common Z transforms) • Differential Calculus - O.D.E. classification - Lebesgue Integral with numerical application in MATLAB™ - Laplace Method - Continuous and Discrete Convolution • Functional Analysis: - Convexity and Concavity of a function • Complex Analysis: - Residue Theorem for polynomial ratios • Topology: - Mahalanobis Distance • Analytical Geometry: - Classification of ellipses with the determinant • Differential Geometry: - Normal coordinates - Gauss curvature - Isoperimetric plane theorem • Mechanics: - Magnus effect • Optical Wave: 16/5785 info @ sciences. eh EAME v3. 5-2013 1. Warnings - Fresnel Diffraction - Fraunhoffer Diffraction • Astronomy: - MacCullagh’s formula - Body flatness indirect calculation - Syncronous locking of tidally evolving satellites • General Relativity: - Real volume of an object in General Relativity - Einstein radius derivation - Gravitational Waves • Cosmology: - Friedmann-Lemaitre metric derivation • Chemistry: - Molecular Rotational Energy and Electron Transitions - Vibrational Energy of Molecules - Vibrational plus Rotational Energy of Molecules • Numerical Methods: - Univariate optimization problem with substitution method - Acceptation/Rejection Sampling - Gibbs Sampling - Outliers vs Influential values - Generali z ed Linear Models (Gauss, Poissson, Negative Binomial, Gamma) - Logistic regression based on maximum likelihood - Cronbach coherence indicator - Linear discriminant Analysis - Quadratic discriminant Analysis - Multidimensional scaling (MDS) - Linear Mixture Model (LMM) - Kernel Smoothing - Mean Shift - PLS Regression - Factorial Analysis - Correspondence Factorial Analysis - GRG Generalized Reduced Gradient (GRG) optimization method info @ sciences. eh 17/5785 1. Warnings EAME v3. 5-2013 • Mechanical Engineering: - Self-buckling (tallest column problem) • Industrial Engineering: - Box Domains - Central Composite Design - Center Face Cube Design - Cox Survival Model (Cox Proportional Hazard Model) - Modelization by Structural Equations - Accelerated life testing • Electronics: Microelectronics • Finance: - Continuous Yield rate - Zero-Coupon curve rates - Equivalence of an obligation rate for a treasure bond - Spot rate and Forward rate - Adjusting the beta of a portfolio with Futures - Cox-Ingersoll Future/Forward price equality - Solution of Black & Scholes ODE - Black Model - Macaulay Duration - Modified Duration - Modified Internal Rate of Return (MIRR) - Binomial Tree (Cox-Ross-Rubinstein) - Options Portfolio hedging * Protective Put/Call * Bull Spread/Call * Bear Spread/Call * Butterfly * Straddle * Strangle * Collar * Calendar spreads * Portfolio allocation methods • Optimal weighted portfolio for balanced risk • Optimal weighted portfolio for error tracking • Optimal weighted Sharp’s portfolio 18/5785 info @ sciences. ch EAME v3. 5-2013 1. Warnings • Optimal weighted portfolio with maximum diversification • Optimal market-bench weighted Treynor-Black Portfolio - Surplus at Risk (SVaR) - Default Credit Risk (based on Standard & Poor rating) - VaR Equity Coverage - Condition VaR loss (CVaR) - Eo kk er- Planck equation - ARCH-GARCH stochastic process - Vector autoregressive models for multivariate time series • Quantitative Management: - Gale-Shapley Algorithm - Newsvendor problem - Bull whip Effect - Condorcet paradox - Computerized Relative Allocation of Facilities Technique (CRAFT) - Real options - Procedural Hierarchical Analysis - Differed Capital in living case (life assurance) - Modified Duration - Death differed temporary (life assurance) Remember that the ETpX sources of this book can be obtained actually depending on your donation on Patreon, Paypal or Tipee. As every robust product has a lifecycle. The lifecycle begins when a product is released and ends when it’s no longer supported. Knowing key dates in this lifecycle helps you make in- formed decisions about when to upgrade. This book has the following lifecycle: a new major or minor version is published every 1st of month following the Gregorian Calendar and can be downloaded with by clicking on the following button (270MB PDF...): I DOWNLOAD eBook or if this link would not work, a copy of the PDF is available on the Internet archive: info @ sciences. eh 19/5785 1. Warnings EAME v3. 5-2013 t— ARCHIVE https://archive.org/details/OperaMagistris To quote this book: @ book { OperaMagistris20 1 3 v3 , author = {Vincent Isoz and Leon Harmel}, title = {Opera Magistris - Elements of Applied Mathematics for Engineers}, year = {2014}, publisher { Sciences. ch}, keywords = {science, physics, maths, engineering, finance, management}, isbn= {978239909327}, } 20/5785 info @ sciences. ch 2 Acknowledgements The ideas in this book have been developed and reinforced by many people. I have greatly benefited from my regular interactions with hundreds of executives from all backgrounds, in- cluding CEOs, CFOs, PMs of many companies around the world, teaching essions, developing company-specific programs, consulting, and even informal conversations. I am grateful to them for sharing their wisdom with me and inspiring many of the ideas in the book This book and its companion website would not have been possible without the valuable support of the people mentioned below. They find here the expression of my gratitude (and for sure if some errors remains in this book this is obviously their fault...): • Harmel Leon (f2012), graduate electrical engineer with a specialization in electronics and automation, responsible in the physical research laboratory at ACEC in Charleroi (BEL), for the provision of documentation that was used in the sections of Corpuscular Quantum Physics, Wave Quantum Physics, Quantum Field Theory, Spinor calculus and General Relativity. • Legrand Mathias, Ph.D. Ecole Centrale de Nantes (FRA) for his help on the redaction of the first 550 pages of the ETpX eBook version of the website. • Ricchiuto Ruben, engineer degree in Physics HES (B.Sc.) from the Engineering School of Geneva (CHE) and mathematician from the University of Geneva for his valuable help in plasma physics, electromagnetism, quantum physics, statistics, topology, quantum chemistry, fractals theory, analysis and many other areas affecting pure mathematics and computing. • Regulars participants to Les Mathematiques.net and Futura-Sciences.com forum, for their valuable assistance in many areas of mathematics and physics. The debates and discus- sions that took place on the forums helps to constantly improve the educational aspect of this book. • The Wikipedia and PlanetMath websites to whom I am indebted to many borrow almost word by word (and this is mutual...). And thanks to all readers, webmasters and teachers for their websites and quality documents 21 2. Acknowledgements EAME v3. 5-2013 available for free and anonymously on the Internet and regular forum stakeholders. I sometimes verbatim recovered their explanations that do not require additions or corrections. It’s proba- bly needless to say that you should not assume that these people are in total agreement with the scientific purposes views expressed in this book; and are not responsible for any errors or obscurities that you might accidentally find in it. Thanks also t few colleagues and customers who were willing to give me their comments to improve the content of this book. However, it is certain that it can still be improved on many points. I would like finally to thank especially all of my family for their continued support and my friends for their patience as I was almost completely absent, but I would like to send a special thanks to my Dad and Mom, for all of her incredibly help and support over the last months of translation of this book! I would like also to apologize to some of my customers and colleagues because as I answered very slowly to their e-mails and phones during thirteen months to better focus on the translation of this book. Thanks also to my girlfriend for always being there to take care of me when I forget to take care of myself... For any public feedback or comment you can use the guestbook associated to this PDF (for questions please use the forum!): http : / /www . sciences . ch/htmlen/guestbook . php or if you want to do a private feedback or comment you can contact me by email. 22/5785 info @ sciences. ch Introduction Contents 1 Forewords 24 2 Methods 31 2.1 Descartes’ Method 34 2.2 Archimedean Oath 35 2.3 Scientific Publication Rules (SPR) 36 2.4 Scientific Mainstream Media communication 38 3 Vocabulary 39 3. 1 On Sciences 40 3.2 Terminology 43 4 Science and Faith 46 23 This book who first Edition has been published in 2001 is designed so that the knowledge required to read it is as basic as possible. It is not necessary to have a Ph.D. to consult it, you just have to know reasoning, to think critically, to observe and have time... "Simplicity is the seal of truth and it radiates beauty" Albert EINSTEIN Forewords No human endeavor has had more impact than Science 1 on our lifes and our conception of the world and ourselves. Its theories, conquests and results are all around us. Omnipresent in the industry (aerospace, imaging, cryptography, transportation, chemistry, al- gorithmic, etc.) or in the services (banking, fintech, insurance, human resources, projects, lo- gistics, architecture, communications, etc.), Applied Mathematics also appears in many other areas: surveys, risk modeling, data protection, politics, etc. Applied Mathematics influence our lifes (telecommunications, transport, medicine, meteorology, music, project management) and contribute to the resolution of current issues: energy, health, environment, climate, optimiza- tion, sustainable development, etc. much more than any soft skill techniques or methodology! They great success are their fabulous dispersion in the real world and their increasing integration in all human and artificial intelligence activities. We are going therefore to a situation where mathematicians and engineers will no longer have the monopoly of mathematics, but where almost any graduate job position will have to do advanced mathematics. As a former student in the field of engineering I have often regretted the absence of a single book fairly comprehensive, detailed (without going to the extreme...) and educational if possible free (!) and portable (being personally a fan of eBooks...) containing at least a non exhaustive idea of the overall program of Applied Mathematics in engineering schools with an overview of what is used for real in companies with more intuitive than rigorous proofs but with enough details to avoid unnecessary effort to the reader. Also a book that does not require the reader to adopt each time a new notation or terminology specific to the author when it is not outright to change to a foreign language... and where anyone can suggest improvements or additions (through the forum, guest-books or by e-mail). I was also frustrated during my studies to have quite often have to swallow "formulas" or "laws" supposedly (and wrongly) non-provable or too complicated as my teachers says or even disap- pointed by renowned authors books (where developments which are left to the reader or as exercise and no real applications are even mention...). In this book predominates the will to never confuse the reader with empty sentences like "it is evident that...", "it is easy to prove that...", "we leave it to the reader as an exercise...", since all developments are presented in 'From Latin scientia "knowledge, a knowing, expertness". Itself from sciens (genitive scientis) that means "intel- ligent, skilled", present participle of scire that means "to know" probably originally comes from "to separate one thing from another, to distinguish" related to scindere "to cut, divide". EAME v3. 5-2013 3. Introduction detail. But I’m not a purist of maths! I have only one ambition: to explain the easiest way possible. Although I have to admit that prove some mathematical relations presented within the engi- neering schools curriculum can not be done because of a lack of time in the official program or size limit in a book, I can not accept that a teacher or author tells his students (respectively, his readers) that certain laws are non-provable (because most of the time this is not true!) or that such or such proof is too complicated without giving a reference (where the student can find the information necessary to satisfy his curiosity) or at least a simplified but satisfactory proof. Moreover, I think that it is totally archaic today that some teachers continue to ask to their stu- dents to take a massive quantity of notes during classes. It would be much more favorable and optimal to distribute a course handout containing all the details in order to be able to concen- trate on the essentials points with students, that is to say the oral explanations, interpretations, understanding, reasoning and practice rather than excessive blackboard copy... Obviously by giving a complete course handout some students will be brilliant by their absence but ... it is the better! Thus, those who are passionate can deepen subjects at home or at the university library, the weak do what they have to do and the rest (struggling students but workers) will follow the course given by the teacher to profit to ask questions rather than mindlessly copying a blackboard. Inspired on a learning model of an American scholar, whose I forgot the name (...), this book proposes and imposes the following properties to the reader: discover, memorize, cite, integrate, explain, restate, infer, select, use, decompose, compare, interpret, judge, argue, model, develop, create, search, reasoning, develop in a clear progressive teaching way to develop the analytic skills and openness. So, in my mind, this non-exhau stive book (and its associated companion PDFs) must be a substitute, free of charge for all students and employees around the World, to many references and gaps of the scholar system, allowing any curious student not to be frustrated for many years during his academic curriculum. Otherwise, the science of the engineer could have the aspect of a frozen science, apart from the scientific and technical developments, a heteroclit accumulation of knowledge and especially of formulas which made he considered as a tasteless subproduct of mathematics and that brings companies and governments to many false results and bad decisions... This book has also been designed to meet the needs of executives, both finance as well as non-finance managers. Any executive who wants to probe further and grasp the fundamentals of strategic finance, strategic marketing or project management engineering and supply chain issues will benefit from its lecture. This book has also for purpose to describes and explains how our Universe and our World (also other "worlds" in our Universe) works in a much more accurate, more complete and detailed way than any Holy book. It gives models and quantification methods for the origin of species, of galaxies, of planets, of quantum phenomenon, of physics movements, of stellar physics, of extreme observable events and also extreme rare events and explains social strategies and modem technologies in a mathematical and provable way that everyone can check by himself and by exposing every-time the assumptions that any reasonable entity should take care of! info @ sciences. ch 25/5785 3. Introduction EAME v3. 5-2013 Obviously Applied Mathematics is such an abundant topic that a book of this scale can only accommodate the basis. Readers are certainly encourage to go beyond this (see the bibliography at the end of the book). Now, those who see Applied Mathematics only as a tool (what it also is), or as the enemy of religious beliefs, or as a boring school field school, are legion. However, it is perhaps useful to recall that, as Galileo said, "the book of nature is written in the language of mathematics" (without wishing to do scientism!). It is in this spirit that this book discusses Applied Mathe- matics for students in the Natural, Earth and Life sciences, as well as for all those who have an occupation related to the various subjects including philosophy or for anyone curious to learn about the involvement of science in everyday life. The choice to study engineering in this book as a branch of Applied Mathematics comes from the fact that the differences between all areas of physics (formerly known as "natural philos- ophy") and mathematics are so hardly notable that Fields medal (the highest award today in the field of mathematics) was awarded in 1990 to physicist Edward Witten, who used physical ideas to prove a mathematical theorem. This trend is certainly not fortuitous, because we can observe that all science, since it seeks to achieve a more detailed understanding of the subject it studies, always finish its trials in the pure mathematics (the absolute path by excellence ...). Thus, we can predict in a far future, the convergence of all the sciences (pure, exact or social) to the mathematics for the modelisation techniques (see for example the French PDF ”L’ explosion des mathematiques " available in the download page of the companion website). It can sometimes seem to us difficult (due to irrational as obscure and unjustified fear of pure sciences in a large fraction of our contemporaries) to transmit the feeling of the mathematical beauty of nature, its deepest harmony and the well-oiled mechanics of the Universe, to those who know only the basics of algebra. The physicist Richard Feynman spoke a day of "two cultures": people who have and those who do not have sufficient understanding of mathematics to appreciate the scientific structure of nature. It is a pity that mathematics are necessary to deeply understand nature and that they also have a bad reputation. For the record, it is claimed that a King who asked Euclid to teach him geometry complained about its difficulty. Euclid replied, "There is no royal road". Physicists and mathematicians can not convert themselves to a different language. If you want to leam about nature, to appreciate its true value, you must understand its language. The nature is revealed only in this form and we can not be pretentious to the point of asking him to change this fact. In the same way, no intellectual discussion will allow you to communicate with a deaf per- son what you really feel while listening music. Similarly, all discussion of the world remain powerless to transmit an intimate understanding of the nature of those of the "other culture". Philosophers and theologians may try to give you qualitative ideas about the Universe. The fact that the scientific method (in the full sense of the term) can not convince the world of its truth and purity, is perhaps the fact of the limited horizon of some people who imagine that the human or another intuitive concept, sentimental or arbitrarily is the center of the Universe (anthropocentric principle). Of course, in order to share this mathematical knowledge, it may seem paradoxical to increase, with our work, the long list of books already available in libraries, in commerce and on the Internet. Nevertheless, I must be able to present arguments that justifies the creation of such a book (and its associated website) as compared to books such as Feynman, Landau or Bourbaki 26/5785 info @ sciences. ch EAME v3. 5-2013 3. Introduction and Wikipedia/Wolfram themselves or Khan Academy or OpenStax. So what do I think I can add to such a wealth of material? 1 . The great pleasure that we take to write this book ("keep the hand" and improve our skills) and have a detailed high quality compendium of tools for our customers and our students (and also all those around the World) for free. 2. The passion for sharing knowledge for free (battle again "copyright madness" (RIP Aaron Swartz!) and without frontiers with a tool of quality as ETgX (at the opposite of Wikipedia that mixes ETpX and normal text and the awful and shameful content of Khan Academy 2 ). 3. Because we can’t wait as there are places in the world where the absence of teaching modern science and its methodology takes peoples to have believes that bring them to some dangerous and obscure paths. 4. We want to offer Applied Mathematics in an enjoyable and easy-to-learn manner ("keep it simple and stupid" at the opposite of the 9 Landau’s graduate level books), because we believe that Applied Mathematics change the way we understand the Universe. 5. This book was first written in French before (in year 2001) that the French version of Wikipedia had good mathematical content and long before Khan Academy or OpenStax did even exist. 6. The quick updates/corrections opportunities (at the opposite of Khan Academy) and col- laborations of a free e-book (with associated effective search tools) without having topics that disappears (at the opposite of Wikipedia). 7. The content depending on readers requests/comments and on our interests (at the opposite of Khan Academy, OpenStax or Landau books) ! 8. At the opposite of Scientific publications (PRL or other similar) that sucks because don’t give detailed proofs and sometimes turn in an infinite loop in references. 9. The access to ETgX sources to everybody so nobody need to recreate the wheel and loose hundred or thousand of hours on redaction instead of innovation (at the opposite of Lan- dau books) ! 10. Rigorous presentation with simplified detailed proofs of all presented concepts (at the opposite of Wikipedia, Khan Academy and OpenStax that focus only of the mathematical proofs of undergraduate concepts). 11. The presentation of many advanced and detailed mathematical tools used in business and R&D. 12. The opportunity for students and teachers to reuse content by copy/paste (at the opposite of Khan Academy or Landau Books). 2 OpenStax has good undergraduate PDF - especially the example in their books - but there are between 40-60% of missing proofs and the table of contents of their PDF and also the Index are not interactive... and major issue...: the content is limited only to undergraduate subjects info @ sciences. ch 27/5785 3. Introduction EAME v3. 5-2013 13. Constant and fixed notation (at the opposite of Wikipedia, Khan Academy and OpenStax) throughout the book, for mathematical operators, a clear language on all topics (3.C. criterion: clear, complete and concise) and focus on the basics to make an important pedagogical work on the subjects (at the opposite of Landau’s books). 14. Gather as much information about pure and exact sciences in one electronic (portable), homogeneous and rigorous book (but that don’t go as far as Landau’s books). 15. Release from all pseudo-truths, only truths that can be proven. 16. Benefit from the development of teaching methods that use the Internet to search for the solution of mathematical problems. 17. The dramatic improvement of automatic translation software and computing power that will make of this book, at least we hope, a reference in the fields of sciences. 18. and... because Applied Mathematics are beautiful and especially when written in LTpX and illustrated (at the opposite of Landau books whose illustrations are quite old and poor). And also ... I believe that the results of individual research are the property of humanity and should be available to all those who explore anywhere the phenomena of nature. In this way the work of each benefit to all, and that is for all humanity that our knowledge cumulates and this is the trend that allows Internet. I do not hide that my contribution is limited largely to this day to that of a collector who gleans his information in the works of masters or publications or from anonymous web pages and who completes and argues developments and improved them when this is possible. Lor those who would accuse me of plagiarism, they should think on the fact that the theorems presented in most non-free books and commercially available have been discovered and written by their predecessors and their own personal contribution was also made, like mine, to put all this in- formation in a clear and modern form a few hundred years later. In addition, it can be seen as doubtful that we ask to pay for access to a culture that is certainly the only truly valid and fair one in this world and where there is no patent or intellectual property rights. This book also reflects my own intellectual limitations. Although I try to study as much science and math fields as possible, it is impossible to master them all. This book shows clearly only my own interests and experiences as consultant, but also my strengths and my weaknesses. I am responsible for the selection of inputs and, of course, of possible errors and imperfections. After attempting a strict (linear) order of presentation of the subject, I decided to arrange this book in a more pedagogical (thematic) way and always with practical examples o applications. It is in my opinion very difficult to speak of so vast subject in a purely mathematical order in only one human life, that is to say, when the concepts are introduced one by one, from those already known (where each theory, operator, tools, etc., would not appear before its definition). Such a plan would require cutting the book, in pieces that are not more thematic. So I decided to present things in a logical order and not in order of need. Thus the reader will encounter, as the editor himself, to the extreme complexity of the subject. 28/5785 info @ sciences. ch EAME v3. 5-2013 3. Introduction The consequences of this choice are the following: 1. Sometimes it will necessary to admit certain concepts, even to understand later. 2. It will probably be necessary for the reader to go at least twice throughout the book. At the first reading, we apprehend the essential and at the second reading, we understand the details (I congratulate this who understand all the subtleties the first time). 3. You must accept the fact that some topics are repeated and that there are many cross- references and complementary remarks. Some know that for every theorem and mathematical model, there are almost always several methods of proofs. I’ve always tried to choose the one that seemed the most simple (e.g. in relativity and quantum physics there is the algebraic and matrix formalism). The objective is to arrive at the same result anyway. This book being in its draft version, it necessarily has lacks on convergence controls, on continu- ity, grammar and others... (which will horrify some readers and mathematicians ...)! However, I have avoided (or, otherwise, I indicate it) the usual approximations of physics and the use of dimensional analysis, by using it as little as possible. I also try to avoid as much as possi- ble subjects with mathematical tools that have not previously been presented and demonstrated rigorously. Finally, this presentation, that can still be improved, is not an absolute reference and contains errors. Any comment is welcome. I shall endeavour, as far as possible, to correct the weaknesses and make the necessary changes as soon as possible. However, while mathematics is accurate and indisputable, theoretical physics (its models), is still interpreted in the common vocabulary (but not in the mathematical vocabulary) and its conclusions all relative. I can only advise, when you read this book, to read by for yourself and not to be subjected to outside influences. You must have a very (very) critical mind, take nothing for granted and question everything without hesitation. In addition, the keyword of good scientist should be: "Doubt, doubt, doubt ... doubt still, and always checks.". We also recall that "nothing that we can see, hear, smell, touch or taste, is what it seems to be", therefore do not rely on your daily experience to draw hasty conclusions, be critical, Cartesian, rational and rigorous in your development, reasoning and conclusions! I want to say to those who would try to find themselves the results of some developments of this book, do not worry if they do not success or if they doubt about their competences because of the time spent solving an equation or problem: some theories that seem obvious or easy today, have sometimes needed several weeks, months, even years, to be developed by mathematicians or leading physicists in the past! I also tried to ensure that this book is pleasing to the eye and to read through. Finally, I have chosen to write this work in the first person plural form: "we". Indeed, the mathematical physics is not a science that has been made or has evolve through individual work but with intensive collaboration between people connected by the same passion and desire of knowledge. Thus, by making use of "we", I would like pay tribute to the dead and missing info @ sciences. ch 29/5785 3. Introduction EAME v3. 5-2013 scientists, to contemporary and future researchers for the work they will perform in order to approach the truth and wisdom. PURE MATH APPLIED MATH IimK 8 - 30/5785 info @ sciences. ch Methods Science is the set of all systematic efforts (scrupulous observations and plausible assumptions until the evidence of the contrary) to acquire knowledge about our environment, to organize and synthesize them into testable laws and theories, whose main purpose is to explain the "how" of things (and NOT the why!) often by a five-step approach: — What do we have? — Where will we go? — What is our goal? — Does it fit the data? Scientists have to submit their ideas and results to independent verification and replication of their peers ("peer-review"). They must abandon or modify their conclusions when confronted with more complete or different evidences. The credibility of Science is based therefore on this self-correcting mechanism and this is what still makes in the 21st century that Science is not the best tool (as we do not know what will exist in the future...) but is has been proven as being the best investigation method for truth in comparison for all other actual existing methods or beliefs. The history of science shows that this system works very long and very well compared to all the others. In each area, progress has been spectacular. However, the system sometimes failed and has also to be corrected before small drifts accumulate. The downside is that scientists are humans. They have the imperfections of all humans, and especially, vanity, pride and conceit. Nowadays, it happens that many people working on the same topic for a given time develop a common faith and believe they hold the truth. The leader of the faith is the Pope and distills his opinion. The Pope that plays the game, takes his miter and his pilgrim’s staff to evangelize his fellow heretics. Until then, this makes smile. But, as in real religions, they sometimes annoying to want to expand their opinion to those who do not believe. Some of these "churches" do not hesitate to behave like the Inquisition. Those who dare to express a different opinion are burned at every opportunity, during conferences, or at their place of work. Some young researchers, uninspired, prefer to convert to the dominant religion, to become clerics faster rather than innovative researchers or even iconoclasts. The great Pope write his Bible to disseminate his ideas, imposes it to read to students and newcomers. He formats then the thought of younger generations and ensures his throne. This is a medieval attitude that can block progress. Some Popes go so far that they believe be the pope in their specialization field automatically gives them the same throne in all other areas... This warning, and the reminders that will follow, must serve the scientific to ask himself by making good use of what we consider today as the good working practices (we will discuss the principles of the Descartes method more below) to solve problems or develop theoretical models. 3. Introduction EAME v3. 5-2013 For this purpose, here is a summary table that provides the steps that should be followed by a scientific who works in mathematics or theoretical physics (for definitions, see just below): Mathematics Physics 1. Expose formally or in common language the "hypothesis", the "conjecture" the "prop- erty" to prove (hypothesis are denoted HE, H2., etc. the conjectures CJ1., CJ2., etc. and the properties PE, P2., etc.). 1. Expose correctly in a formally or common language all the details of the "problems" to solve (problems are denoted Pl., P2., etc.). 2. Define the "axioms" (non-demonstrable, independent and non-contradictory) that will give the starting points and establish restrictions on development (the axioms are denoted Al., A2, etc.) 3 . In the same vein, the mathematicians defines the specialized vocabulary related to mathematical operators which will be denoted by Dl., D2., etc. 2. Define (or state) the "postulates" or "prin- ciples" or the "hypothesis" and "assump- tions" (supposedly unprovable...) that will give the starting point and establish restric- tions on the developments (typically, as- sumptions and principles are denoted Pl., P2., etc. and assumptions HE, H2., etc. try- ing to avoid the notation confusion between postulates and principles) 4 . 3. Once the Axioms laid, pull directly "lem- mas" or "properties" whose validity follows directly and prepare the development of the- orem supposed to validate departure hypoth- esis or conjectures (Lemmas being denoted LE, L2., etc. and properties Pl., P2., etc.). 3. Once the "theoretical model" devel- oped, check equations units for possible er- rors in the developments (such checks being marked VAl., VA2., etc.). 4. Once the "theorems" (noted Tl., T2., etc.) prooved conclude on "consequences" (denoted Cl., C2., etc.) and even properties (noted Pl., P2., etc.). 4 . Search for borderline cases (including "singularities") of the model to verify the validity intuitively (these borderline controls are denoted CL1., CL2., etc.). 5. Test the strength (robustness) or use- fulness of the conjectures or hypothesis by proving the reciprocal of the theorem or by comparing them with other examples of mathematical well-know theories to see if form together a coherent structure (examples being denoted E1., E2., etc.). 5. Experimentally test the theoretical model obtained and submit work to compare with other independent research teams. The new model should provide experimental results and never observed (predictions to falsify). If the model is validated then it is the official status of "theory". 6. Possible remarks may be shown in a hi- erarchically structured order and noted R1., R2., etc. 6 . Possible remarks may be shown in a hi- erarchically structured order and noted R1., R2., etc. Table 3.1 - Methodology for Maths & Physics Developments Proceed as in the above table is a possible working basis for people working in mathematics and physics. Obviously, proceed cleanly and traditionally as above takes a little more time than 'Sometimes "properties", "conditions" and "axioms" are confused while the concept of axiom is much more accu- rate and profound. 2 You should not forget, however, that the validity of a model is not dependent on the realism of its assumptions but on the conformity of its implications with reality. 32/5785 info @ sciences. ch EAME v3. 5-2013 3. Introduction doing things no matter how (this is why most teachers do not follow these rules, they don’t have enough time to cover the entire course program). definitions hypothesis 1 theoretical results empirical changes in observations process hypothesis Note also a fun shape of scientific 8 commandments: 1. The phenomenas you will observe And never measures you will falsify (attention to the confirmation error: study only phenomena that validate your belief) 2. Hypothesis you will proposed That with experiment you will test 3. The experiment precisely you will describe Because your colleague will reproduce it (attention to the narrative discipline trap: the facts will be fitted to the desired results) 4. With your results A theory you will build 5. Parsimony you will use And the simplest hypothesis you will retain 6. Ultimate truth will never be (epistemic humility) And always you will search for the truth info @ sciences. ch 33/5785 3. Introduction EAME v3. 5-2013 7. From a non-refutable thesis you will refrain Because outside of the science it will remain 8. All failures will be like a success Because science can confirm but also invalidate Remarks Rl. Caution! It is very easy to make new physical theories by just aligning words. This is named "philosophy" and the Greeks thought of the atoms in this method. This can lead with a lot of luck to a true theory. Against it is much more difficult to make a "predictive theory", that is to say with equations that predict the outcome of an experiment. R2. What separates mathematics and physics is that in mathematics, the hypothesis is always true. Mathematical discourse is not a proof of an external seeking truth, but a target of consistency. What should be correct is just the reasoning. v_ ! i 1 W When these rules are not respected, we speak of "scientific fraud" (which often leads to being fired from his job but unfortunately we still not retired the diplomas when it happens). In general, scientific fraud itself comes in three main forms: plagiarism, fabrication of data and alteration of results unfavourable to the hypothesis, the omission of clear working hypotheses and recolted datas. To these frauds we can also add behaviors that pose problems regarding to the quality of work or more specifically to ethics, such as those aimed at increasing appearance in the production (and through the famous of the scientist) by submitting for example several times the same publication with only a few modifications, the omission of conflict of interest, the dangerous experiments, the non-conservation of primary data, etc. 2.1 Descartes’ Method Now we present the four principles of the Descartes’ method which, as remind, is considered as the first scientific in history by his method of analysis: PI. Never accept anything as true that I obviously knew her to be such. That is to say, care- fully avoid precipitation and to understand nothing more in my judgments than what would appear so clearly and distinctly to my mind, that I had no occasion to doubt. P2. Divide each of the difficulties I have to examine into as many parts as possible (scrupulous observations and plausible hypothesis until evidence of the opposite), and that would be required to resolve them in the best way. P3. Driving my thoughts in order, beginning with the simplest objects and easiest to know, to go up gradually by degrees to the knowledge of the most compounds, and even assuming the order between those who not naturally precede each other. P4. Make everywhere so complete enumerations and so general reviews, that I’m sure not to omit anything. 34/5785 info @ sciences. eh EAME v3. 5-2013 3. Introduction 2.2 Archimedean Oath Inspired by the Hippocratic Oath, a group of students of the Ecole Polytechnique Federale de Lausanne in 1990 developed an oath of Archimedes expressing the responsibilities and duties of the engineer and technician. It was taken in various versions by other European engineering schools and could serve as basic inspiration oath for scientific researchers (even if there are some important points missing). "Considering the life of Archimedes of Syracuse which illustrated as of Antiquity the ambiva- lent potential of the technique, considering the responsibility increasing for the engineers and scientists with regard to the men and nature, considering the importance of the ethical problems that the technique and its applications raise, today, I pledge following and will endeavour to tend towards the ideal which they represent: 1. I will practice my profession for the good of the people, in the respect of the Human Rights and of the Environment. 2. I will recognize, being as well as possible informed to me, the responsibility for my acts and will not discharge me to in no case on others. 3. I will endeavor to perfect my professional competences. 4. In the choice and the realization of my projects, I will remain attentive with their context and their consequences, in particular from the point of view technical, economic, social, ecological... I will pay a detailed attention to the projects being able to have fine soldiers. 5. I will contribute, in the measurement of my means, to promote equitable relationships between humans and to support the development of the countries lower-income group. 6. I will transmit, with rigor and honesty, with interlocutors chosen with understanding, any information important, if it represents an asset for the company or if its retention constitutes a danger to others. In the latter case, I will take care that information leads to concrete provisions. 7. I will not let myself dominate by the defense of my interests or those of my profession. 8. I will make an effort, in the measurement of my means, to lead my company to take into account the concerns of this Oath. 9. I will practice my profession in all intellectual honesty, with conscience and dignity. 10. I promise it solemnly, freely and on my honor." info @ sciences. ch 35/5785 3. Introduction EAME v3. 5-2013 2.3 Scientific Publication Rules (SPR) It is impossible to have a constructive debate or analysis if the basis material is unusable. Sadly still in the 21st century it is easy to found Nobel Price publication that were peer-reviewed and that are scientifically unusable. This is why we recall here the basic scientific publication rules for a publication be accepted by a real scientific peer-review committee: 1. Use of LaTeX for the writing of the publication 2. All redaction files and raw data files must have ISO compliant names 3. The publication should have a GUID 4. Put the publication date in the publication 5. Put the major and minor version of the publication (eg: v3.6 r58) 6. Put the experiment (development) period date (ISO date format) 7. Write an abstract 8. Write an introduction 9. All measurement units must follow ISO standards 10. Use the "principle of precaution" (use of conditional) 11. Use "reactive responses", that is to say the make the confrontations between hypotheses / data, hypotheses / facts, hypotheses / observations 12. Use, when available, "leverage factors" to give substance and credit to the work by making reference to other corresponding publication on the same subject 5 13. Material and Methods should be described in details. For theoretical papers, they should provide a link (URL) or reference where the full detailed proof can be found (if detailed proof is omitted in the original publication!) 14. Put high resolution print-screens of charts or photos 15. Write the results and for experimental data always provide a statistical analysis to show if the effect seems significant or not (sample size effect also or fluctuation interval) 16. Calculate the propagation of errors of measurement instruments 17. Write the precautional conclusion 18. Give access to the raw data in a non-proprietary format to the scientific community 19. Give access to the scripts/code used for data analysis to the scientific community 5 This also the very important step of "personal review", that is to say a personal analysis of several tens / hundreds of scientific publications and that you have made one critical analysis that you use to build your own argument. 36/5785 info @ sciences. eh EAME v3. 5-2013 3. Introduction 20. Give access to the LaTeX sources of the publication to the scientific community 21. Provide exact version (with minor release) of the softwares used to publish the paper 22. Put the bibliography with the references 23. Put the % financial support of each sponsor 24. Submit the paper to the peer-review committee 25. List all actors (with position, grade, e-mail) and peer-reviewers (only name for that latter) of the paper Any publication that doesn’t respect at least one of this rule cannot be considered as a "scien- tific" publication! Remark Even if is there is a consensus between scientists, a unique oriented study (which can be very important) can be used to influence the opinion of mainstream media, governments and people. This is why a study must always be done and peer-reviewed by independent teams and laboratories. V / info @ sciences. ch 37/5785 3. Introduction EAME v3. 5-2013 2.4 Scientific Mainstream Media communication The reader of mainstream media or also social networks must never trust a scientific study if the reference and peer-reviewed paper is not given as link. The study must also not be taken as absolute by reader if there is a consensus of the scientific community but only on... ONE... study. The only way to be almost sure is to read the study itself if it respects the above protocol. A typical bad example is a news that was taken by many international mainstream media on the Lyme-Borreliose disease as following: ••••o AIS 4G 05:40 ® -1 9 97%*' RETOUR Une simple pommade antibiotique a base d'azithromycine a prouve son efficacite contre la borreliose de Lyme, une affection grave transmise par les tiques, selon une etude avec participation suisse qui a ete publiee mardi. Appliquee durant trois jours 72 heures au plus tard apres la morsure de tique, la pommade a revele une efficacite de 100%, selon des tests realises aupres de 1000 patients: aucun n'a developpe de borreliose de Lyme. Dans le meme temps, sept infections se sont declarees dans le groupe traite avec un placebo, selon les resultats de cette recherche parue dans The Lancet. Eviter trois semaines d'antibiotiques Identifiee pour la premiere fois aux Etats- Unis en 1975, la maladie de Lyme, une affection d’origine bacterienne, peut conduire a de graves complications neurologiques et articulaires si elle n'est i— \ r rlA4-r»/'f A/> 4* *■% ■ 4* r\ r\ A 4/Mvir»r I /•* Figure 3.1 - Swiss TV publication about Lyme-Borreliose treatment the 2017-01-08 (source: RTS App) In summary what the "scientific journalist" (humm humm... I think it must be a new intern in fact...), of one of the main National Swiss Television (so a TV that has enough money to investigate correctly any news... at least in theory... in a country that assess to be number one in almost everything...), has published is a very bad (catastrophic) interpretation of the real article. The above article report that: "...a treatment applied during 3 days not later than 72 hour after after the bite of the tick has revealed and efficiency of 100%.... In reality (if medias did have read the publication until the end...) the study was stopped after 8 weeks and it has been shown that the treatment has no better effect than a placebo... 38/5785 info @ sciences. ch Vocabulary Physics and mathematics, like any field of specialization, has its own vocabulary. So that the reader is not lost in the understanding of certain texts he can read in this PDF, we have chosen to present here a few fundamentals words, abbreviations and definitions to know. Thus, the mathematician like to finish his proofs (when he thinks they are correct) by the abbre- viation "Q.E.D." which means "Quod Erat Demonstrandum" (this is Latin). And during definitions (they are many in math and physics ...) scientist often use the following terminology: • ... it is sufficient that ... • ... if and only if ... • ... necessary and sufficient ... • ... means ... • ... prove it ... These four are not equivalent (identical in the strict sense). Because "it is sufficient that" cor- respond to a sufficient condition, but not to a necessary condition. Also it must be notice that these four are place in the context of data analysis, data accuracy, reproduction and peer-review and not on any personal or common belief or also emotional aspect of a group of people (even if this group of people is more than a few billion individuals...)! 3. Introduction EAME v3. 5-2013 IF GLOBAL MRMIH6 IS Real, emm V THAT / , CcUTuRY IfTtfrti. CfiVMV If fa BARTH 3.1 On Sciences It is important that we define rigorously the different types of sciences to which humans often refers. Indeed, it seems that in the 21st century a misnomer is established and that it became impossible for people to distinguish the "intrinsic quality" between a "science" and another one. Remark Etymologically, the word "science" comes from the Latin "Scienta" (knowledge) whose root is the verb "scire" which means "to know". V / This abuse of language is probably the fact that pure and accurate sciences lose their illusions of universality and objectivity, in the sense that they are self-correcting. This has for effect that some sciences are relegated to the background and try to borrow these methods, principles and origins to create confusion. We must therefore be very careful about the claims of scientificity in the human sciences, and this is also (or especially) true for the dominant trends in economics, sociology and psychology. Quite simply, the issues addressed by the human sciences are ex- tremely complex, poorly reproducible, and empirical arguments supporting their theories are often quite low. By itself, however, science does not produce absolute truth. By principle, a scientific theory is valid as long as it can predict measurable and reproducible results. But the problems of interpretation of these results are part of natural philosophy. Given the diversity of phenomena to be studied, over the centuries there has been a growing number of disciplines such as chemistry, biology, thermodynamics, etc. All these disciplines 40/5785 info @ sciences. eh EAME v3. 5-2013 3. Introduction that are a priori heterogeneous have common foundation physics, for language mathematics and for elementary principle the scientific method. Thus, a small memory refresh seems useful: Definitions (#1): D1 . We define as "pure science" any set of knowledge based on rigorous reasoning valid what- ever the (arbitrary) elementary factor selected (when we say then "independent of sensible reality") and restricted to the minimum necessary. Only mathematics (often named the "queen of sciences") can be classified in this category. D2. We define as "exact science" or "hard science", any set of knowledge based on the study of an observation, observation that has been transcribed in symbolic form and that can be reproduce (theoretical physics for example... sometimes...). Primarily, the purpose of exact sciences is not to explain the "why" but the "how". And never forget... Science (especially physics) doesn’t have to "make sense" it just has to make all the right, testable predictions! According to the philosopher Karl Popper, a theory is scientifically acceptable if, as presented, it can be "falsifiable" (synonyms are "refutable" or "testable"), i.e. subjected to experimental tests (or if it is possible to conceive of an observation or an argument which negates the statement in question). The "scientific knowledge" is then by definition the set of theories that have resisted to falsification. Science is by nature subject to continuous questioning. Caution! There is no doubt that the exact sciences have yet an enormous prestige, even among their opponents because of their theoretical and practical success. It is certain that some scien- tists sometimes abuse of this prestige by showing a sense of superiority that is not necessarily justified. Moreover, it often happens that this same scientists exposed in the popular literature, very speculative ideas as if they were very approved, and extrapolate their results outside the context in which they were tested (and ... under the hypotheses they were checked once...). D3. We define as "engineering science" any set of knowledge or practices applied to the needs of human society such as electronics, chemistry, computer science, telecommunications, robotics, aerospace, biotechnology... D4. We define as "science" any body of knowledge based on studies or observations of events whose interpretation has not yet been transcribed and verified with mathematical rigour, characteristic of previous sciences, but using comparative statistics. We include in this definition: medicine (we should however be careful because some parts of medicine are studying phenomena using mathematical descriptions such as neural networks or other info @ sciences. ch 41/5785 3. Introduction EAME v3. 5-2013 phenomena associated with known physical causes), sociology, psychology, history, biol- ogy, etc. Some teacher like to play with the word "science" as the acronym of (that’s not stupid for college students): Solve, Create, Investigate, Evaluate, Notice, Classify, Experiment. D5. We define as "soft science" or "para-science", any set of knowledge or practices that are currently based on non- verifiable facts (not scientifically reproducible) by experience or by mathematics. We include in this definition: astrology, theology, paranormal (which was demolished by zetetic science), graphology... As some scientists say: «It looks like science, it use the vocabulary of science... but that’s not science at all.» D6. We define as "phenomenological science" or "natural sciences", any science which is not included in the above definitions (history, sociology, psychology, zoology, biology, ...) D7. "Scientism" is an ideology that considers experimental science is the only valid mode of knowledge, or, at least, superior to all other forms of interpretation in the world. In this perspective, there is no philosophical, religious or moral truths superior of scientific theories. Only account what is scientifically proven. D8. "Positivism" is a set of ideas that considers that only the analysis and understanding of facts verified by experience can explain the phenomena of the sensible world. Certainty is provided solely by the scientific experiment. He rejects introspection, intuition and metaphysical approach to explain any knowledge of the phenomena. What is interesting about this doctrine is that it is certainly one of the few that re- quires people to have to think for themselves and to understand the environment around them by continually questioning everything and by never accepting anything as granted (...). In addition, the real sciences have this extraordinary property that they give the possibility to understand things beyond what we can see. But, science is science, and nothing more: a certain ordering, not too bad success, things that no longer leads to the metaphysics as the time of Aristotle, but that does not pretend to give us the whole story on reality or even the bottom of visible things. 42/5785 info @ sciences. ch EAME v3. 5-2013 3. Introduction 3.2 Terminology The table of methods we presented above contains terms that may perhaps seem unknown or barbarians for you. This is why it seems important to provide definitions of these and some other equally important that can avoid important confusion. Definitions (#2): Dl. Beyond its negative sense, the idea of "problem" refers to the first step of the scientific method. Formulate a problem is also essential for its resolution and allows to properly understand what is the problem and see what needs to be resolved. The concept of "problem" is intimately connected to the concept of "assumption" which will see the definition below. D2. A "hypothesis" is always, in the context of a theory already established or underlying, a supposition awaiting confirmation or refutation that attempts to explain a group of facts or predict the onset of new facts. Thus, a hypothesis can be at the origin of a theoretical problem that has to be resolved formally. D3. The "postulate" or "assumption" in physics corresponds frequently to a principle (see definition below) which admission is required to establish a proof (we mean that this is a non-provable proposition). The mathematical equivalent (but in a more rigorous version) of the assumption is the "axiom" for which we will see the definition below. D4. A "principle" (close parent of "postulate") is a proposal accepted as a basis for reasoning or a general theoretical guide line for reasoning that needs to be performed. In physics, it is also a general law governing a set of phenomena and verified by the accuracy of its consequences. The word "principle" is used with abuse in small classes or engineering schools by teachers not knowing (which is very rare), or unwilling (rather common), or that can’t because lack of time (almost exclusively ) prove a relation. The equivalent of the postulate or principle in mathematics is the "axiom" which we define as follows: D5. An "axiom" is a self-evident proposition or truth by itself which admission is necessary to establish a proof. info @ sciences. eh 43/5785 3. Introduction EAME v3. 5-2013 Remarks Rl. We could say that this is something we define as the truth for the speech that we argue, like a rule of the game, and that it does not necessarily a universal truth value in the sensitive world around us. R2. Axioms must always be independent (one should not be able to be proved from the other) and non-contradictory (sometimes we also say that they must be "consistent"). V / D6. The "corollary" is a term unfortunately almost nonexistent in physics (wrongly!) and that is in fact a proposal resulting from a truth already demonstrated. We can also say that a corollary is and obvious and necessary consequence of a theorem (or sometimes of a postulate in physics). D7. A "lemma" is a proposal deduce from one or more assumptions or axioms and that for which the proof prepares this of a theorem. D8. A "conjecture" is a supposition or opinion based on the likelihood of a mathematical result. Many conjectures have as as little similar to lemmas, as they are checkpoints to obtain significant results. D9. Beyond its weak conjecture sense, a "theory" or "theorem" is a set articulated around a hypothesis and supported by a set of facts or developments that give it a positive content and make the hypothesis well-founded (or at least plausible in the case of theoretical physics). DIO. A "singularity" is an indeterminacy in a calculation That takes the appearance of a division by zero. This term is both used in mathematics and in physics. Dll. A "proof" is a set of mathematical procedures to follow to prove the result already known or not of a theorem. D12. If the word "paradox" etymologically means: contrary to common opinion, it is not by pure taste for provocation, but rather for solid reasons. A "sophism" meanwhile, is a deliberately provocative statement, a false proposition based on an apparently valid rea- soning. Thus we speak about the "Zeno’s paradox" when in reality it is only a sophism. The paradox is not limited to falsity, but implies the coexistence of truth and falsity, so that one can no longer distinguish true and the false. The paradox appears as an unsolvable problem an "aporia". 44/5785 info @ sciences. ch EAME v3. 5-2013 3. Introduction Remark It should be added that the well-knows paradoxes, by the questions they raised, have per- mitted significant advances to science and led to major conceptual revolutions in math- ematics as in theoretical physics (the paradoxes on sets and on infinity in mathematical, and those at the base of relativity and quantum physics). V / info @ sciences. ch 45/5785 Science and Faith We will see that in Science, a theory is usually incomplete because it can not fully describe the complexity of the real world or because it does not predict what we don’t know (excepted for Quantum Physics or General Relativity). It is thus for theories like the Big Bang (see section Astrophysics) or the Evolution of species (see sections Populations Dynamics or Decision and Games Theory) because they are not reproducible in laboratories under identical conditions. But some other theories are so accurate to predict physical phenomena that some people believe that mathematics is the nearest language with God (at least for those that believe in a divinity...). IN o WE TRUST We should distinguish between different scientific currents: • "Realism" is a doctrine where physical theories have the aim to describe reality as it is in itself, in its unobservable components. • "Instrumentalism" is a doctrine where theories are only tools to predict observations but do not describe reality itself. • "Fictionalism" is the doctrine where the content repository (principles and postulates) of theories is just an illusion, useful only to ensure the linguistic articulation of the funda- mental equations. EAME v3. 5-2013 3. Introduction Even if today the scientific theories are sponsored by many specialists, alternative theories have valid arguments and we can not totally dismiss them. However, the creation of the world in seven days as described in the Bible is difficult to accept, and many believers recognize that a literal reading of the Bible is not compatible with the current state of our knowledge and that is more prudent to interpret it as a parable. If science never provides definitive answer, it is no longer possible to ignore it. Faith (whether religious, superstitious, pseudo-scientific or other) on the contrary is intended to provide absolute truths of a different nature as it is a personal unverifiable belief. In fact, one of the functions of religion is to give meaning to the phenomena that can not be explained rationally. Progress of knowledge trough science therefore cause sometimes questioning the religious dogma. Conversely, except try to impose his own faith (which is nothing but a subjective and intimate personal conviction ) to others, we must defy the natural temptation to characterize scientifically proven fact extrapolations of scientific models beyond their scope. The word "science" is, as we have already mentioned above, increasingly used to argue that there is a scientific evidence where there is only a belief (some web pages like this proliferate always more and more). According to its detractors it is, for example, the case of the movement of Scientology (but there are many others). According to them, we should rather speak about "occult sciences". The occult sciences and traditional sciences exist since antiquity; they consist on a series of mysterious knowledge and practices designed to penetrate and dominate the secrets of nature. Over the past centuries, they have been progressively excluded from science. The philosopher Karl Popper has longly questioned himself about the nature of the demarcation between science and pseudoscience. After noticing that it is possible to find observations to confirm almost any theory, he proposes a methodology based on falsifiability. A theory must according to him, to deserve the adjective "scientific", guarantee the impossibility of some events. It becomes therefore refutable, so (and only then) capable of integrating science. It would suffice to observe any of these events to invalidate the theory, and therefore take the way to improving it. And also let us notice that major difference between science books and religion books is that if you destroyed that latter, in a thousand year’s time that wouldn’t come back just as it was. Whereas if we took every science book and every fact and destroyed them all, in a thousand years they’d all be back. Because all the same tests would be the same results. info @ sciences. ch 47/5785 3. Introduction EAME v3. 5-2013 4.0.1 Baloney detection kit Through their training, scientists are equipped with what Carl Sagan name the "baloney detec- tion kit" or "bullshit detection kit" that is a set of cognitive tools and techniques that fortify the mind against penetration by falsehoods and to draw boundaries between science and pseudo- science. It isn’t merely a tool of science, it contains invaluable tools of healthy skepticism that apply just as elegantly, and just as necessarily, to everyday life. By adopting the kit, we can all shield ourselves against clueless guile and deliberate manipulation. There are many version of these detection tool but here is an quite complete one (but still incomplete by construction) a proposed by Michael Shermer (founding publisher of <Skeptic Magazine and author of The Borderlands of Science): 1. How reliable is the source of the claim? Pseudoscientists often appear quite reliable, but when examined closely, the facts and figures they cite are distorted, taken out of context or occasionally even fabricated. Of course, everyone makes some mistakes. And as historian of science Daniel Kevles showed so effectively in his book The Baltimore Affair, it can be hard to detect a fraudu- lent signal within the background noise of sloppiness that is a normal part of the scientific process. The question is, Do the data and interpretations show signs of intentional dis- tortion? When an independent committee established to investigate potential fraud scru- tinized a set of research notes in Nobel laureate David Baltimore’s laboratory, it revealed a surprising number of mistakes. Baltimore was exonerated because his lab’s mistakes were random and nondirectional... So in science, there are no authorities. At most, there are experts ! 2. Does this source often make similar claims? Pseudoscientists have a habit of going well beyond the facts. Flood geologists (creation- ists who believe that Noah’s flood can account for many of the earth’s geologic forma- tions) consistently make outrageous claims that bear no relation to geological science. Of course, some great thinkers do frequently go beyond the data in their creative specula- tions. Thomas Gold of Cornell University is notorious for his radical ideas, but he has been right often enough that other scientists listen to what he has to say. Gold proposes, for example, that oil is not a fossil fuel at all but the by-product of a deep, hot biosphere (microorganisms living at unexpected depths within the crust). Hardly any earth scientists with whom I have spoken think Gold is right, yet they do not consider him a crank. Watch out for a pattern of fringe thinking that consistently ignores or distorts data. 3. Have the claims been verified by another source? Typically pseudoscientists make statements that are unverified or verified only by a source within their own belief circle. We must ask, Who is checking the claims, and even who is checking the checkers? The biggest problem with the cold fusion debacle, for in- stance, was not that Stanley Pons and Martin Fleischman were wrong. It was that they announced their spectacular discovery at a press conference before other laboratories ver- ified it. Worse, when cold fusion was not replicated, they continued to cling to their claim. Outside verification is crucial to good science. 48/5785 info @ sciences. ch EAME v3. 5-2013 3. Introduction 4. How does the claim fit with what we know about how the world works? An extraordinary claim must be placed into a larger context to see how it fits. When people claim that the Egyptian pyramids and the Sphinx were built more than 10,000 years ago by an unknown, advanced race, they are not presenting any context for that earlier civilization. Where are the rest of the artifacts of those people? Where are their works of art, their weapons, their clothing, their tools, their trash? Archaeology simply does not operate this way. 5. Has anyone gone out of the way to disprove the claim, or has only supportive evidence been sought? This is the "confirmation bias" (we will come back on cognitive bias in the section of Decision Theory), or the tendency to seek confirmatory evidence and to reject or ignore disconfirmatory evidence. The confirmation bias is powerful, pervasive and almost im- possible for any of us to avoid. It is why the methods of science that emphasize checking and rechecking, verification and replication, and especially attempts to falsify a claim, are so critical. 6. Does the preponderance of evidence point to the claimant’s conclusion or to a different one? The theory of evolution, for example, is "proved" through a convergence of evidence from a number of independent lines of inquiry. No one fossil, no one piece of biological or paleontological evidence has "evolution" written on it; instead tens of thousands of evidentiary bits add up to a story of the evolution of life. Creationists conveniently ignore this confluence, focusing instead on trivial anomalies or currently unexplained phenom- ena in the history of life. 7. Is the claimant employing the accepted rules of reason and tools of research, or have these been abandoned in favor of others that lead to the desired conclusion? A clear distinction can be made between SETI (Search for Extraterrestrial Intelligence) scientists and UFOlogists. SETI scientists begin with the null hypothesis that ETIs do not exist and that they must provide concrete evidence before making the extraordinary claim that we are not alone in the universe. UFOlogists begin with the positive hypothesis that ETIs exist and have visited us, then employ questionable research techniques to support that belief, such as hypnotic regression (revelations of abduction experiences), anecdo- tal reasoning (countless stories of UFO sightings), conspiratorial thinking (governmen- tal cover-ups of alien encounters), low-quality visual evidence (blurry photographs and grainy videos), and anomalistic thinking (atmospheric anomalies and visual mispercep- tions by eyewitnesses). 8. Is the claimant providing an explanation for the observed phenomena or merely deny- ing the existing explanation? This is a classic debate strategy-criticize your opponent and never affirm what you believe to avoid criticism. It is next to impossible to get creationists to offer an explanation for life (other than "God did it"). Intelligent Design (ID) creationists have done no better, picking away at weaknesses in scientific explanations for difficult problems and offering in their stead. "ID did it." This stratagem is unacceptable in science. info @ sciences. ch 49/5785 3. Introduction EAME v3. 5-2013 9. If the claimant proffers a new explanation, does it account for as many phenomena as the old explanation did? Many HIV/AIDS skeptics argue that lifestyle causes AIDS. Yet their alternative theory does not explain nearly as much of the data as the HIV theory does. To make their argument, they must ignore the diverse evidence in support of HIV as the causal vec- tor in AIDS while ignoring the significant correlation between the rise in AIDS among hemophiliacs shortly after HIV was inadvertently introduced into the blood supply. 10. Do the claimant’s personal beliefs and biases drive the conclusions, or vice versa? All scientists hold social, political and ideological beliefs that could potentially slant their interpretations of the data (this is a "confirmation bias" also named "cherry picking" that is also by non-scientists the main cause of rejecting science results and tools), but how do those biases and beliefs affect their research in practice? Usually during the peer-review system, such biases and beliefs are rooted out, or the paper or book is rejected. By fine tuning we can go more far about reasoning fallacies. Here is a most exhaustive list: 1. Ad hominem: An ad hominem argument attacks the messenger, not the message itself. 2. Argument from authority: Argument that relies on the identity of an authority rather than the components of the argument itself. 3. Argument from adverse consequences: Saying that because the implications of a state- ment being true would create negative results, it must not be true. 4. Appeal to ignorance: If something is not known to be false, it must be true. 5. Special pleading: Stating a universal principle, then insisting that it doesn’t apply to your assertions for some reason. 6. Begging the question/ assuming the answer: This occurs when a statement has an un- proven premise. It is also named "circular reasoning" or "circular logic". 7. Observational selection: Looking at only positive evidence while ignoring the negative and vice versa. 8. Statistics of small numbers: Using small numbers in order to report large percentage increases. 9. Misunderstanding of the nature of statistics: Ignorance about central statistical assump- tions and the definition of metrics (the confusion of correlation and causation, the sample size and hate of maths bias are well known example). 10. Post hoc, ergo propter hoc: Basing an effect on a cause only on the basis of chronology. 11. Excluded middle, or false dichotomy: Portraying an issue or argument as having only two options and no spectrum in between. 12. Short-term vs. long-term: Assuming a current trend has remained constant throughout its history and will continue to do so in the future, even though no evidence suggests such an extrapolation is justified. 50/5785 info @ sciences. eh EAME v3. 5-2013 3. Introduction 13. Slippery slope, related to excluded middle: Saying something is wrong because it is next to or loosely related to something wrong. 14. Suppressed evidence and half-truths: Drawing an unwarranted conclusion from premises that are at least in part correct. 15. Weasel words: The usage of vague, non-specific references. In addition to teaching us what to do when evaluating a claim to knowledge, any good baloney detection kit must also teach us what not to do. It helps us recognize the most common and per- ilous fallacies of logic and rhetoric. Many good examples can be found in religion and politics, because their practitioners are so often obliged to justify two contradictory propositions. Finally, we would like to quote Lavoisier: «The physicist may also, in the silence of his labo- ratory and his cabinet, perform patriotic functions; he can thanks to his works reduce the mass of evils which afflict happiness and, had he not, contributed by the new roads that he opened to himself, only to delay of a few years, of a few days, the average life of humans, he could also aspire to the glorious title of benefactor of humanity.» Section quality score: ☆☆☆☆☆ 151 votes, 75.23% info @ sciences. ch 51/5785 4 Arithmetic Mathematics is the ultimate form of forced art. (unknown) Contents 1 Proof Theory 54 1.1 Paradoxes 59 1.2 Propositional Calculus 62 1.3 Predicate Calculus 79 1.4 Proofs 86 2 Numbers 97 2.1 Digital Bases 100 2.2 Type of Numbers 103 3 Arithmetic Operators 156 3.1 Binary Relations 156 3.2 Fundamental Arithmetic Laws 167 3.3 Arithmetic Polynomials 183 3.4 Absolute Value 184 3.5 Calculation Rules (operators priorities) 187 4 Number Theory 192 4.1 Principle of good order 192 4.2 Induction Principle 193 4.3 Divisibility 195 5 Set Theory 231 5.1 Zermelo-Fraenkel Axiomatic 235 5.2 Set Operations 245 5.3 Functions and Applications 253 6 Probabilities 269 6. 1 Event Universe 270 52 EAME v3. 5-2013 4. Arithmetic 6.2 Kolmogorov’s Axioms 271 6.3 Conditional Probabilities 277 6.4 Martingales 297 6.5 Combinatorial Analysis 299 6.6 Markov Chains 308 7 Statistics 313 7.1 Samples 315 7.2 Averages 316 7.3 Type of variables 339 7.4 Fundamental postulate of statistics 366 7.5 Diversity Index 367 7.6 Distribution Functions (probabilities laws) 369 7.7 Likelihood Estimators 476 7.8 Finite Population Correction Factor 488 7.9 Confidence Intervals 491 7.10 Weak Law of Large Numbers 528 7.11 Characteristic Function 532 7.12 Central Limit Theorem 536 7.13 Univariate Hypothesis and Adequation tests 542 7.14 Robustness 654 7.15 Multivariate Statistics 699 7.16 Survival Statistics 742 7.17 Propagation of Errors (experimental uncertainty analysis) 757 7.18 A World without statistics 764 info @ sciences. ch 53/5785 Proof Theory W E have chosed to begin the study of Applied Mathematics by the theory that seems to us the most fundamental and important in the field of pure and exact sciences: Proof Theory. The proof theory and of propositional calculus (logic) has three objectives through this book: 1. Teach to the reader how to reason and demonstrate (prove), and this independently of the specialization field. 2. Show that the process of a demonstration (proof) is independent of the language used. 3. Prepare the reader to the Logic Theory (see section Logic Systems). 4. Prepare the path to Godel’s incompleteness theorem (main goal of this section!). 5. Prepare the reader to the Automata Theory (see section Automata Theory). Godel’s theorem is probably the most exciting point because if we define religion as a system of thought that contains unprovable statements, then it contains elements of faith, and Godel tells us that mathematics is not only a religion, but that then it is the only religion that can prove it is one! Remarks Rl. It is (very) strongly advised to read this section in parallel with those on Automata Theory and Logical Systems (including Boolean Algebra) available in Theoretical Computing chapter of this book. R2. We must approach Proof Theory as a sympathetic curiosity but which basically brings nothing much except working/reasoning methods. Moreover, its purpose is not to show that everything is demonstrable but that any proof can be done on a common language starting from a finite number of rules. V Often when a student arrives in a graduate class he learned how to calculate or use algorithms but almost only a little or even not at all to reason. For all the reasoning the visual media is a powerful tool (a picture is worth a thousand words) and people who do not see that in tracing a given curve or straight line the solution appears or who do not see in space are really penalized. During high school we already manipulate unknown objects but especially to make calculations and when we reason about objects represented by letters, we can replace them visually by a real number, a vector, etc. At a given level we ask people to reason on more abstract structures and therefore to work on unknown objects which are elements of a set itself unknown, for example elements of any group (see section Set Theory). This visual support thus doesn’t exist anymore. EAME v3. 5-2013 4. Arithmetic We ask so often to students to reason, to demonstrate the properties, but almost no one has ever taught them to reason properly, writing proofs, control proofs. If we ask a graduate student what is a proof, it most likely he will have some difficulty to answer. He can say that it is a text in which there are keywords like "therefore", "because", "if", "if and only if", "take a x such that", "assume", "lemma", "theorem", "let us look for a contradiction", etc. But he will probably be unable to provide the grammar of these texts nor their basics, and besides, its teachers, if they have not taken a course in Proof Theory, would probably be unable too. To understand this situation, remember that to speak a child does not need to know the grammar. He imitates his surroundings and it works very well: most of time a six year old child know to use complicated sentences without ever having done grammar. Most teachers also do not know the grammar of reasoning but, for them, the imitation process has work well and thus they reason correctly. The experience of the majority of university teachers shows that this process of imitation works well for very good students, and then it is enough, but it works much less, if not at all, for many others. As the complexity level is low (especially during an "equational" type reasoning), grammar is almost useless but when the level increase or when we do not understand why something is wrong, it becomes necessary to do some grammar to progress. Teachers and students are familiar with the following situation: in a school assignment the corrector barred whole page of a large red line and write "false" in the margin. When the student asks what is wrong, the corrector can only say things like "this has no relation with the requested proof", "nothing is right", ..., which help obviously not the student to understand. This is partly because the text written by the student uses the appropriate words but in a more or less random way and can not give meaning to the assembly of these words. In addition, the teacher does not have the tools to explain what is wrong. We must therefore give them to him! These tools exist but are fairly recent. The proof theory is a branch of mathematical logic whose origin is the crisis of the foundations: there was a doubt about what we had the "right" to do in a mathematical reasoning (see the "foundations crisis" further below). Paradoxes appeared, and it was then necessary to clarify the rules of proof and to verify that these rules are not contradictory. This theory appeared in the early 20th century, which is very new since most of the mathematics taught in the first half of the university is known since the 16th- 17th century. 1.0.1 Foundations Crisis For the Greeks philosophers geometry was considered the highest form of knowledge, a pow- erful key to the metaphysical mysteries of the universe. It was rather a mystical belief and the link between mysticism and religion was made explicit in cults like those of the Pythagoreans. No culture has been deified a man for discovering a geometrical theorem! Later, mathematics was regarded as the model of a priori knowledge in the Aristotelian tradition of rationalism. No culture has since challenged a man for having discovered a geometrical theorem! Later, mathematics was regarded as the model of a priori knowledge in the Aristotelian tradition of rationalism. The astonishment of the Greeks philosophers for mathematics has not left us, we find it in the traditional metaphor of mathematics as "Queen of Science". It was strengthened by the info @ sciences. ch 55/5785 4. Arithmetic EAME v3. 5-2013 spectacular success of mathematical models in science, success that the Greeks (even ignoring the simple algebra) had not anticipated. Since the discovery by Isaac Newton’s of integral calculus and the inverse square law of gravity in the late 1600s the phenomenal sciences and higher mathematics remained in close symbiosis - to the point that a predictive mathematical formalism was became the hallmark of a "hard science". After Newton, during the next two centuries, science aspired to that kind of rigour and purity that seemed inherent in mathematics. The metaphysical question seemed simple: mathemat- ics seemed to have a perfect a priori knowledge, and among all sciences, those that were able to mathematize most perfectly were the most effective for predicting phenomena. The per- fect knowledge therefore, was in a mathematical formalism that, once reached by science and embracing all aspects of reality, could found a posteriori empirical knowledge on an a priori rational logic. It was in this spirit that Marie Jean- Antoine Nicolas de Caritat, Marquis de Condorcet (French philosopher and mathematician), undertook to imagine describing the entire Universe as a set of partial differential equations being solved one after the other. The first break in this inspiring picture appeared in the second half of the 19th century, when Riemann and Lobachevsky separately proved that Euclid’s parallel axiom could be replaced by other geometries that produced "consistent" (we will come back more on this word further below). Riemannian geometry was modelled on a sphere, these of Lobatschewsky, on rotation of a hyperboloid. The impact of this discovery was later obscured by great upheaval, but at the time it made a thunderclap in the intellectual world. The existence of mutually inconsistent axiomatic systems, each of which could be a model for the phenomenal Universe, relied entirely into question the relation between mathematics and theoretical physics. When we knew only Euclid, there was only one possible geometry. One could believe that the Euclid’s axiom (see section of Euclidien Geometry) were a kind of knowledge a priori perfect on the geometry in the phenomenal world. But suddenly we had three geometries, embarrassing for metaphysical subtleties. Why would we choose between the axioms of plane geometry, spherical and hyperbolic geom- etry as real descriptions? Because all three are consistent, we can not choose any a priori as a foundation - the choice must be empirical, based on their predictive power in a given situation. Of course, the theoretical physicists have long been accustomed to choose a formalism to study a scientific problem. But it was already accepted widely, if not unconsciously, that the need to do so was based on human ignorance, and with logic or good enough mathematics, one could infer the right choice from principles first, and produce a priori descriptions of reality that had to be confirmed afterwards by empirical verification. However, Euclidean geometry, seen for hundreds of years as the model of axiomatic perfection of mathematics, had been dethroned. If we could not know a priori something as basic as the geometry in space, what hope was there for a pure rational theory that would encompass all of nature? Psychologically, Riemann and Lobachevsky had struck at the heart the mathematical enterprise as it had been designed before. Moreover, Riemann and Lobachevsky have pushed the nature of mathematical intuition into question. It was easy to believe implicitly that mathematical intuition was a form of perception 56/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic - a way to glimpse the Platonic world behind reality. But with two other geometries pushing the Euclid one in it’s limit, no one could never be sure to know what the world really looks like. Mathematicians responded to this dual problem with excessive rigour, trying to apply the ax- iomatic method in all mathematics. In the pre-axiomatic period, the proofs were often based on commonly accepted intuitions of the "reality" of mathematics, which could not automatically be regarded as valid. The new way of thinking about mathematics led to a series of spectacular success. Yet this had also a price. The axiomatic method made the connection between mathematics and the phenomenal reality increasingly close. Meanwhile, discoveries suggested that mathematical axioms that appeared to be consistent with phenomenal experience could lead to dizzying con- tradictions with this experience. Most mathematicians quickly became "formalist" arguing that pure mathematics could only be regarded as a kind of elaborate philosophy game that was played with symbols on paper (that’s the theory that is behind the mathematical prophetic qualification "zero content system" by Robert Heinlein). The "Platonic" belief in the reality of mathematical objects, in the old- fashioned way, seemed good for the trash, despite the fact that mathematicians still feel like platoniciens during the process of discovery of mathematics. Philosophically, then, the axiomatic method led most mathematicians to abandon previous be- liefs in the metaphysical specificity of mathematics. It also produced the contemporary rupture between pure and Applied Mathematics. Most of the great mathematicians of the early modern period - Newton, Leibniz, Fourier, Gauss and others - also occupied phenomenal science. The axiomatic method had hatched the modem idea of the pure mathematician as a great aesthete, heedless of physics. Ironically, formalism gave the pure mathematicians a bad addiction to the Platonic attitude. The researchers in Applied Mathematics ceased to meet physicists and learned to put themselves in their behind. This takes us to the early 20th century. For the beleaguered minority of Platonists, the worst was yet to come. Cantor, Frege, Russell and Whitehead showed that all pure mathematics could be built on the simple foundation of the Set Theory axiomatic. This suited well the formalists: the mathematics were reunifying, at least in principle, from a small set of rules detached of a big one. Platoniciens also were satisfied, if a great structure appeared, consistent keystone for the whole mathematics, the metaphysical specificity of mathematics could still be saved. In a negative way, though, a Platonist had the last word. Kurt Godel put his grain of sand in the program of axiomatization formalism when he proved that any sufficiently powerful axiom system to include integers numbers had to be either inconsistent (contain contradictions) or incomplete (too weak to decide the rightness or the falsity of some statements of the system). And that’s more or less where things stand today. Mathematicians know that many attempts to advance mathematics as a priori knowledge of the Universe must face numerous paradoxes and unable to decide which axiom system describes the real mathematics. They have been reduced to hope that standards axiomatizations are not inconsistent but just incomplete, and wondering anxiously what contradictions or unprovable theorems are waiting to be discovered elsewhere. However, on the front of empiricism, mathematics was always a spectacular success as a theo- retical construction tool. The great success of physics in the 20th century (General Relativity info @ sciences. ch 57/5785 4. Arithmetic EAME v3. 5-2013 and Quantum physics) pushed so far out of the realm of physical intuition, they could only be understood by meditating deeply on their mathematical formalism, and extending their logical conclusions, even when those findings seemed wildly bizarre. What irony! Just as the mathe- matical perception were to appear always less reliable in pure mathematics, it became more and more indispensable in phenomenal science. In contrast to this background, the applicability of mathematics to phenomenal science poses a more difficult problem than at first appears. The relation between the mathematical models and prediction of phenomena is complex, not only in practice but also in principle. Even more complex, as we now know, there are ways to axiomatize mathematics that mutually exclude themselves! But why is there only one good choice of mathematical model? That is, why is there a mathe- matical formalism, for example for quantum physics, so productive that it predicts the discovery of new observable particles? To answer this question we will can observe that, as well, works as a kind of definition. For many phenomenal systems, such exact predictive formalism has not been found, and none seem plausible. You can easily find such examples: climate or the behavior of a superior economy to that of a town - systems so chaotically interdependent that exact prediction is actually impossi- ble (not only in practice but in principle). 58/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 1.1 Paradoxes Since ancient times, some logicians had noticed the presence of many paradoxes within ratio- nality. In fact, we can say that despite their number, these paradoxes are merely illustrations of a few paradoxical structures. Let us look to for general culture to the most famous which constitute the class of "undecidable propositions". ^Example: The paradox of the class of classes (Russell) There are two types of classes: those that contain themselves (or reflexive classes: the class of non-empty sets, the class of classes, ...) and those who do not contains themselves (or non-reflexives classes: the class of work to be returned, the class of blood oranges, ...). The question is the following: is the class of non-reflexives classes itself reflexive or non-reflexive? If it is reflexive, it contains itself and is thus in the class of non-reflexives classes that it represents, which is contradictory. If it is non-reflexive, it must be included in the class of non-reflexives classes and becomes ipso facto reflexive, we are facing again a contradiction. This Russell’s paradox is often known mainly under the two following variants: • Does the set of all sets that do not contain themselves contain himself? The answer is: If "Yes", then "No" and if "No" then "Yes"... • Those who do not shave themselves are shaved by the barber but not those who shave themselves. So who shaves the barber? The answer is: If the barber shave himself he enters in the category of peo- ple that shave themselves so he does not shave himself because he is the barber... But if he does not shave himself he enters in the category of people that are shaved by the barber... The answer is also undecidable... Russell’s paradox challenges the notion of a set as a collection defined by common ownership! In one shot it destroys the logic (undecidable proposition) and set theory... because the overall concept of all sets is an impossibility! ! ! The self-reference is the center of this logical problem! This paradox also returns to the question whether a math question correctly formulated (logical) necessarily admits an answer? Said in another way: is any mathematical statement provable... and it is Godel that many years after the statement of Russell’s paradox proved mathematically that the answer is No !!!!!! In other words, there will always be questions unanswered because any system (living language or mathematical tool) based itself is necessarily incomplete! This is the famous impact of Godel’s incompleteness theorem! Let us see another application of the Russell’s Paradox: info @ sciences. ch 59/5785 4. Arithmetic EAME v3. 5-2013 ^Example: In a library, there are two types of catalogues: Those who mention themselves and those who does not mention themselves. A librarian must draw up a catalogue of all catalogues that do not mention themselves. Having completed its work, our librarian asks whether or not to mention the catalogue that is precisely drafting. At this point, he is struck perplexity. If he does not mention this catalogue it will be a catalogue that is not mentioned and which should therefore be included in the list of catalogues that does not mention themselves. On the other hand, if he mentions the catalogue, this catalogue will become a catalogue that is mentioned and must therefore not be included in this catalogue, since it is the catalogue of catalogues which does not mention themselves. A variations of the previous paradox is the well-known liar paradox: ^Example: Let us provisionally define lying as the work of making a false proposition. The Cretan poet Epimenides said: "All Cretans are liars", this is the proposal P. How to decide the truthfulness of PI If P is true, as Epimenides is Cretan, P must be false. P must therefore be false to be true, which is contradictory. As would have made understand the logician Ludwig Wittgenstein, these paradoxes ultimately show that mathematics is a pretty good tool to show the logic but not to talk about it. Give with mathematics an independent existence to this algebraic entities is madness and it is this that produces monsters like the set of all the sets... The logic is empty and can not tell the reality, it restrict to be just a picture of it. 60/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 1.1.1 Hypothetical-Deductive Reasoning The hypothetical-deductive reasoning is, we know (see the Introduction of the book), the ability of the learner to deduce conclusions from pure hypotheses and not only of a real observation. It is a thought process that seeks to identify a causal explanation of any phenomenon (we will come back on this during our first steps in physics). The learner who uses this type of reasoning begins with a hypothesis and then tries to prove or disprove his hypothesis following the block diagram below: empirical changes in observations process hypothesis Figure 4.1 - Hypothetical-Deductive Reasoning block diagram The deductive procedure is to hold as true, provisionally, this first proposal that we name, in logic a "predicate" (see further below for more details) and to draw all the consequences logi- cally necessary, that is to say to look for its implications. ^Example: Consider the proposal P : "X is a man", it implies the following proposition (): "X is mortal". The expression P =>■ Q (if it is a human it is necessarily mortal) is a predicative impli- cation (hence the term "predicate"). There is no case in this example where we can state P without Q. This example is that of strict implication, as we find in the "syllogism" (logical reasoning figure). info @ sciences. ch 61/5785 4. Arithmetic EAME v3. 5-2013 Remark Experts have shown that the hypothetical-deductive reasoning develops gradually by chil- dren from six to seven years old and that this kind of reasoning is used systematically starting with a strict propositional function until the age of eleven-twelve. v. 1.2 Propositional Calculus The "propositional calculus" (or "propositional logic") is an absolutely indispensable prelimi- nary to tackle a background in science, philosophy, law, politics, economics, etc. This type of calculation allows for decisions or testing procedures. These help to determine when a logical expression (proposition) is true and especially if it is always true. Definitions (#3): Dl. An expression that is always true whatever the content language of the variables that compose it is named a "valid expression", a "tautology" or a "law of propositional logic". D2. An expression that is always false is named a "contradiction" or "antilogy". D3. An expression that is sometimes true, sometimes false is called a "contingent expression". D4. We name "assertion" an expression that we can say unambiguously whether it is true or false. D5. The "object language" is the language used to write logical expressions. D6. The "meta-language" is the language used to talk about the object language in everyday language. Remarks Rl. There are expressions that are actually not assertions. For example, the statement "this statement is false" is a paradox that can be neither true nor false. R2. Consider a logical expression A. If it is a tautology, we frequently note it |= A and the A |= if it is a contradiction. R3. In mathematics we can try to prove in a general way that an assertion is true, but not that it is false (if this is the case we give just one example). V / 62/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 1.2.1 Propositions (premises) Definition (#4): In logic, a "proposition" is a statement that has meaning. That means we can say unambiguously whether this statement is true (T) or false (F). This is what we name the "Law of excluded middle". ^Examples: El. "I lie" is not a proposition (premise). If we assume that this statement is true, it is an affirmation of his own disability, so we should conclude that it is false. But if we assume that it is false, then the author of this statement does not lie, so he told the truth, thus the proposal would be true... E2. Another funny example is: • Everything has a creator • God is that creator • God does not have creator It’s a solution that fails since it violates its own premise... Definition (#5): A proposition in binary logic (where the proposals are either true or false) is therefore never true and false at the same time. This is what we call the "principle of non- contradiction". Thus, a property on the set of propositions E is an application P from E to the set of "truth values True, False" {T, F}: P : E {T, F} (1.1) We speak about "associated subset" , when the proposition only generates a portion E' of E and vice versa. ^Example: In E = N, if P(x) states "x is even", then P = (0,2,4, ..., 2 k,...} which is indeed only an associated subset of E but of same Cardinal (see section Sets Theory). In E = N, if the proposition P{x) is "x is even", then (0, 2, 4, ...,2k , ...} which is effectively an associated subset of E but with same Cardinal (section Set Theory). info @ sciences. ch 63/5785 4. Arithmetic EAME v3. 5-2013 Definition (#6): Let P be a property of the set E. A property Q on E is a "negation" of P if and only if, for any x G E: • Q(x ) is F (false) if P(x) is T (true) • Q(x) is T (true) if P(x) is F (false) We can gather these conditions in a table called "truth table": P Q T F F T Table 4.1 - Truth table of values Table that we can also find or also write in the most explicit following form: P Q True False False True Table 4.2 - Truth table of explicit values or in binary form: P Q 1 0 0 1 Table 4.3 - Truth table of binary values In other words, P and Q always have opposite truth values. We denote this kind of statement "Q is a negation of P": Q -v^ —iP where the symbol -i is the "negation connector". ( 1 . 2 ) Remark The expressions must be well-formed expressions (often abbreviated "WEE"). By defi- nition, any variable is a well-formed expression, thus ->P is a well-formed expression. If P, Q are well-formed formulas, then P Q is a well-formed expression (the expression "I am lying" is not well-formed because it contradicts itself). V / 64/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 1.2.2 Connectors There are other types of logical connectors: Definition (#7): Let P and Q two properties set defined on the same set E. P V Q (read "P OR Q") is a property on E defined by: • P V Q is true if at least P or Q are true • P V Q is false otherwise We can create the truth table of the "OR connector" or "disjunction connector" V: p Q PVQ T T T T F T F T T F F F Table 4.4 - OR truth table It should be easy to convince yourself that if the parts P, Q of E a are respectively associated with the properties P, Q thus PU Q (see section Set Theory) is associated to P V 0: P^P Q^Q (1.3) pvQhpuq The connector V is associative (no doubt about the fact that it is commutative!). For proof, just do a truth table where you can check that: [P V (Q V R)} = [(P V Q) V R] (1.4) Definition (#8): There is also the "AMD connector" or also named "conjunction connector" A for whatever are P, Q two properties defined on E, P A Q is a property defined on E by: • P A Q is true if both properties P, Q are true (the famous syllogism: All men are mortal, Socrates is a man, therefore Socrates is mortal is a famous example). • P A Q if false otherwise We can create the truth table of the "AMD connector" or "disjunction connector" V: P Q PVQ T T T T F F F T F F F F Table 4.5 - AND truth table info @ sciences. ch 65/5785 4. Arithmetic EAME v3. 5-2013 It should be also almost easy to convince yourself that if the parts P, Q of E a are respectively associated with the properties P, Q thus P D Q (see section Set Theory) is associated to P A Q. P^P Q^Q (1.5) P AQ PnQ The connector A is associative (no doubt about the fact that it is commutative!). For proof, just do a truth table where you can check that: [P A (Q A i?)] = [(P A Q) A R] (1.6) The connectors V, A are distributive one on the other. Using a simple truth table, we can show that (ask me if you want a put the truth table): [P V (Q A i?)] = [{P V Q) A (P V R)] (1.7) as well as: [P A (Q V R)} = [(P A Q) V (P A R)} (1.8) Definition (#9): The "negation" operator -> transform a True value into a False value such that: -<T = F (1.9) -iF — T (1.10) So in logic, negation, also named "logical complement", is an operation that takes a proposition P to another proposition "not P", written ->P or sometimes P, which is interpreted intuitively as being True when P is false and False when P is True. Negation is thus a unary (single- argument) logical connective. As we will prove it in detail in the section of Fogic System (using a simple truth table) the "De Morgan’s laws" provide a way of distributing negation over disjunction and conjunction: n(PAQ)»[(nP)V(nQ)] (1.11) n(PVQ)»[(nP)A(nQ)] (1.12) (1.13) Remark To see the details of all logical operators, the reader should read the section of Fogical Systems (see chapter Theoretical Computing) where the identity, the double negative, the idempotence, associativity, the distributive properties, the De Morgan relations are presented more formally and with full details. V i w Fet us now come back on the "logical implication connector" sometimes also named just the "conditional" denoted by the symbol =x 66/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Remark In some books on propositional calculus, this connector is denoted by the symbol D and as part of the proof theory we often prefer the symbol — *. V / Let P, Q two properties given on E. P =>- Q is a property on E defined by: PI. P =>- Q is False if P is True and 0 is False. P2. P =>- 0 is True otherwise. In other words, the fact that P logically implies Q means that Q is True for any assessment for which P is True. The implication is therefore the famous "if... then ...". If we write the truth table of the implication (caution with the before last line! ! !): p Q P^Q T T T T F F F T T F F T Table 4.6 - Implication truth table In other terms, a False proposition implies that any conclusion will always be True. If the proposition is True the implication can be True only if the result is True. ^Example: Consider the proposition: "If you get your diploma, I buy you a computer". Of all cases, only one corresponds to a broken promise: the one where the child graduates, and still has no computer (second line in the table above). What means exactly this promise, that we will write as following: You have your degree =>■ I buy you a computer"? Exactly this: • If you have get graduate, for sure, I will buy you a computer (I can not not buy it). • If you do not get graduate, I said nothing. The implication gives us that from any false proposition we can deduce any proposal (last two lines). info @ sciences. ch 67/5785 4. Arithmetic EAME v3. 5-2013 ^Example: In a course teached by Russell on the subject from a false proposition, any pro- posal can be inferred, a student asked him the following question (anecdote or legend?): • "Are you saying that from the proposition 2 + 2 = 5, it follows that you are the Pope?". • "Yes", answered Russell. • "And could you prove it!", asked the student skeptical... • "Certainly", answered Russell, who immediately offered the following proof: 1. Suppose that 2 + 2 = 5. 2. Subtract 3 from each member of the equality, we thus get 1 = 2. 3. By symmetry 2 = 1. 4. The Pope and I are 2. Since 2 = 1, Pope and I are 1. It follow I’m the Pope. The implication connector is essential in mathematics, philosophy, etc. It is a backbone of any proof, evidence or deduction. It has the following useful properties (normally easy to check with a small truth table): P =+ Q •+> [(=Q) => (-’-P)] (1-14) P =» Q <+ [(=P) V Q) (1.15) And we have from the last property (again verifiable by a truth table): ->(P + Q)+>[PA (=Q)] (1.16) The "logical equivalence connector" or "biconditional connector" denoted most of times by "<+" or sometimes by "+>•" meaning by definition: (P •+> Q) <+ [(P =+ <2) A (<2 =+ P)] (1.17) in other words, the first expression has the same value for all evaluation of the second. It is the same with the following relation that is more "atomic" as the logical equivalence is reduced only to the use of A, V and negation = (combination of what we have seen above): (P <+> Q) <+> [(P A Q) V (=P A =<3)] (1.18) When we prove such equivalence of two expressions we can therefore say that: "we prove that the equivalence is a tautology". The truth table of the equivalence is logically given by: p Q P => Q Q => P Q^P T T T T T T F F T F F T T F F F F T T T Table 4.7 - Truth table of the equivalence 68/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic P Q means (when its true!) that "P and 0 always have the same truth value" or "P and 0 are equivalent." This is True if P and Q have the same value, False otherwise. Of course (it is a tautology): -(P ^ Q) =*► (P => -Q) (1.19) The relation P Q is equivalent to that P is a necessary and sufficient condition for 0 and that Q is a necessary and sufficient condition for P. The conclusion is that the conditions of the types: "necessary", "sufficient", "necessary and sufficient" can be reformulated with the terms "only if", "if", "if and only if". Therefore: 1. P =>■ Q reflects the fact that 0 is a necessary condition for P or in other words, P is True only if Q is True (in the truth table, when P => Q is equal to 1 we see that P is 1 only if Q is also 1). We also say that if P is true then Q is true. 2. P <= Q or what is the same Q ^ P reflects the fact that Q is a sufficient condition for P or in other words, P is True if Q is True (in the truth table, when 0 =4- P takes the value 1 we see that P is 1 if Q is 1 too). 3 . P ^ Q reflects the fact that 0 is a necessary and sufficient condition for P or in other words, P is True if and only if Q is True (in the truth table, when P Q takes the value 1 we see that P is 1 if Q is 1 and if and only if Q is equal to 1). Remark The expression "if and only if" therefore corresponds to a logical equivalence and can only be used to describe a bi-implication ! ! V . / The first stage of propositional calculus is the formalization of natural language statements. To make this work, the propositional calculus finally provides three types of tools: 1. The "propositional variables" (P, Q, R , ...) symbolize any simple proposals. If the same variable occurs multiple times, each time it symbolizes the same proposal. 2. The five logical operators: -i, A, V, <^>, =>. 3. Punctuation are reduced to only opening and closing parentheses that organize reading so as to avoid ambiguity. info @ sciences. ch 69/5785 4. Arithmetic EAME v3. 5-2013 Description Symbol Usage "Negation" is an operator that act only one one proposal; it is unary or monadic. "It is not raining" will be written ->P. This statement is true if and only if P is false (in this case if it is false that it is raining). The conventional use of negation is characterized by the double negation law: -i-i P is equivalent to P. — 1 -nP The "conjunction" or "logical product" is a binary operator; it connects two proposals. "Every man is mortal AND my car loses oil" is written P A Q. This latter expression is true if and only if P is true and Q is true. A PAQ The "disconnection" or "logical sum" is also a binary operator. PVQ is true if and only if P is true OR Q is true or both are true. We can understand the OR into two ways: either inclusively or exclusively. In the first first case P V Q is true if P is true, if Q is true or if P and Q are both true. In the second case, P V Q is true if P is true or if Q is true but not if both are. The disjunction of propositional calculus is the inclusive OR and the give to the exclusive OR, that is to say the XOR, the name of "alternative". V P\JQ The "implication" is also a binary operator. It corresponds roughly to the linguistic pattern " If ... then ..." . "If I have time, I will go see movie " will be written P =>- Q. This latter relation is false if P is true and Q is false. If the result (here Q ) is true, the implication P Q is true. When the antecedent (here P ) is false, the implication is always true. This latter remark can be understood if one refers to statements of the type: "If we could put Paris in a bottle, the Eiffel Tower would be used as a plug" . In summary, an implication is false if and only if its antecedent is true and its consequent is false. P^Q The "bi-implication" or "equivalence" =>• is, too, a binary operator: it symbolizes the terms "... if and only if ..." and "... is equivalent to ..." . The equivalence of two propositions is true if they have the same truth value. The bi-implication therefore expressed as a form of identity, which is why it is often used in definitions. Pa^Q Table 4.8 - Summary of logical core operators The reader should find sometimes by some authors that like to use at minimum the natural language in their books the symbol the sign that is sometimes placed before a logical con- sequence, such as the conclusion of a syllogism. The symbol consists of three dots placed in an upright triangle and is read therefore. We can also make use of the symbol "y" and is read because. ^Example: y All men are mortal y Socrates is a man. Socrates is mortal 70/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic In this book we will avoid using this notation as the engineers don’t make use of this a lot. It is possible to establish equivalences between these operators. We have already seen how the biconditional could be defined as a product of reciprocal conditional, let us see now other equivalences: (P O Q) O -i(P A -iQ) (P o Q) o (->P V Q) (P V Q) o (- 1 P => Q) (P A Q) o ->(P o -iQ) Remark The classical operators A, V, AA can therefore be defined using the canonical operators -i, o through equivalence laws between them. V A 1 so notice the two relations of De Morgan (see sectionof Boolean Algebra for the proof): -I (P V Q) O (-i P A -i Q) ~>(P A Q) o (->P V -i Q) They allow to transform the disjunction into conjunction and vice versa: (P V Q) — 1 ( — P A —<Q) ( P A Q) o - 1(- V —iQ) V ( 1 . 21 ) (1.22) 1.2.3 Decision procedures We have previously introduced the basic elements allowing us to operate on expressions from properties (propositional variables) without saying much about the handling of such expres- sions. So now you need to know that in propositional calculus there are two ways to establish that a proposition is a law of propositional logic. We can either: 1. Use non-automated procedures 2. Use axiomatic and demonstrative procedures Remark In many books these procedures are presented before the structure of the propositional language. We chose here to do the opposite thinking that the approach would be easier. V inf o@ sciences. ch 71/5785 4. Arithmetic EAME v3. 5-2013 1.2.3. 1 Non-axiomatic procedural decisions Several of these methods exist, but we will limit ourselves here to the simplest of them, that of the matrix calculation, often referred to as "methods of truth tables". The construction procedure is as we have already seen quite simple. Indeed, the truth value of a complex expression is a function of the truth value of the simple statements that compose it, and finally based on the truth value of propositional variables that makes it. Considering all possible combinations of truth values of propositional variables, we can determine the truth values of the complex expression. Truth tables, as we have seen it, give us the possibility to decide, about any proposition, if this latter is a tautology (always true), a contradiction (always false) or a contingent expression (sometimes true, sometimes false). We can distinguish at least four ways to combine propositional variables, brackets and connec- tors: Name Description Example 1 Malformed statement Nonsense. Neither true nor false (VP)Q 2 Tautology Statement always true P V -iQ 3 Contradiction Statement always false P A ->P 4 Contingent statetement Statement sometimes true, sometimes false PMQ Table 4.9 - Combination of propositional variables The method of truth tables helps to determine the type of expression that are well-formed to which we have to face. It requires, in principle, no invention, it is "only" a mechanical pro- cedure. Axiomatized procedures, however, are not entirely mechanical. Inventing a proof as part of an axiomatized system requires sometimes hability, practice or luck. Regarding to truth tables, here is the protocol to follow: When facing a well-formed expression, or function of truth, we first determine how many dis- tinct propositional variables we are dealing with. We then examine the various arguments that form this expression. We then construct a table with 2 n columns (n being the number of vari- ables and without forgotting that they are binary variables!) and a number of columns equal to the number of arguments plus columns for the expression itself and its other components (see previous examples). Then we assign to the variables the various combinations of True (1) and False (0) values that may be conferred upon them. Each row corresponds to a possible outcome and all of the rows is the set of all possible outcomes. There is, for example, a possible outcome wherein P is a true statement while Q is false. 1.2.3.2 Axiomatic procedural decisions The axiomatization of a theory implies, besides its formalization, that we start form a finite number of axioms and that through the controlled transformation of these, we can get all the theorems of this theory. So we start from a few axioms whose truth is a statement (not proven). We determine afterwards deduction rules for manipulating the axioms or expression 72/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic obtained from these. The sequence of these deductions is a proof that leads to a theorem, a law or lemma. We will now briefly present two axiomatic systems, each consisting of axioms using two specific rules named "inference rules" (intuitive rules): 1. The "modus ponens": If we prove A and A =>- B, then we can deduce B. A is named the "minor premise" and B the "major premise" of the modus ponens rule. ^Example: From: x > y (1.23) and: (x > y) => (y < x) (1.24) we can deduce that: y < x (1.25) Remark Humans typically communicate in a way that resists shallow logical analysis. In a real conversation, people use words rather than terms, make utterances rather than sentences, and employ a wider variety of inference methods than modus ponens. A great deal of what is communicated and inferred in a conversation depends on context, the speakers and audience, their history, their shared knowledge and con- fidences, the feelers they lay out to establish mutual trust and rapport. V 2. The "substitution": we can in a schema of axioms replace a letter by any formula, at the condition that all identical letters are replaced by identical formulas ! Let us give as an example, two axiomatic systems: the axiomatic system of Whitehead and Russell, the axiomatic system of Lukasiewicz. (a) The axiomatic system of Russell and Whitehead adopts -i, V as primitive symbols and define =>•, A, <=> from these latter as follows (easily verifiable relations with truth tables): (. A B ) <=>• —>A V B (A A B) -vv- — i ( — 1^4. V —<B) (A B) (A => B) => (B => A) (1.26) This system includes 5 axioms, somewhat quite obvious plus two rules of inference. Axioms are given here using non-primitive symbols, as did Whitehead and Russell: info @ sciences. ch 73/5785 4. Arithmetic EAME v3. 5-2013 Al. (A V A) =► A A2. B => (AV B) A3. (4VB)^(BV A) A4. (AV(BV C)) =>{ BV (AV 67 )) A5. (5 => 67) => (A V 67) we have already presented above some of these. For example, to justify that ->A V A has a sense, we can proceed as following: (1) 5^(AVB) Axiom A2 (2) B^(AVA) (1) and substitution (3) (- B => C) => ((A VB)^(AV C)) =* (A V 67)) Axiom A5 (necessary) (4) (B => 67) => ((“ 'A V B) => (— <A V 67)) (3) and substitution (5) (5 => 67) => ((A =>B)=> (A^ 67)) (4) and property of =»■ (6) {{A V A) =* A) =► ((A => (A V A)) => (A => A)) (5) and substitution (V) (A ^ (A V A)) => (A => A) (6) (modus ponens) (8) (A => A) =► (A => A) (7) and axiom Al (9) A =► A (8) and modus ponens (10) -.A V A (9) and property of =>- (b) The axiomatic system of Lukasiewicz includes three axioms, plus the two rules of inference (modus ponens and substitution): Al. (A =* B)((B =>C)^(A=> 67)) A2. A =>- (~ 'A =>• B) A3. (-.A ^ A) ^ A Here is the proof of the first two axioms in the system of Russell and Whitehead. These are the formulas (6) and (16) of the following derivation: 74/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic (1) (AV(BVC)) (B V (A VC)) Axiom A5 (2) (-.(£ =► C)V(-.(Av5)V(i4vC'))) =>- (— '(AV-B) V(— >(B =>)V(AvC))) (1) and substitution (3) ->(A VB)^ (( B => C) => (A V C)) A4 on (2) and modus ponens (4) (A\I B) ^ ((B =>C)^(AV C)) (3) by the property of => (5) (— > A V B ) =>- ((B =?■ C) (— <A V O)) (4) and substitution (6) (A=>B)=> ((B =>C)=> (A=> C)) (5) and property of =>■ (7) (B ^(AM RW ^ ((( Av B ) => (B V A))\ (S=>(AVS))=^ ^ (B ^(BvA)) ) (6) and substitution (8) ((A V B)^(BM A)) =>(B=>(BM A)) (7) modus ponens (9) B=>(BM A) (8) modus ponens (10) — i B (—if? V A) (9) and substitution (ID “i—i B V (—if? V A ) (10) and property of =>• (12) —i—i B V (— i-B V A ) (— >-B V (—i—i B V A)) A4 + substitution with (11) (13) — i B V (—i—i B V A) (12) and modus ponens (14) B (—i—i B V A) (19) and property of => (15) B (— i B V A) (14) and property of =>■ (16) A (— <A B ) (15) and substitution These axiomatizations let us found as theorems all tautologies or laws of the propositional logic. From everything that has been said so far, we can try to define what is a proof! ! ! ! Definition (#10): A finite sequence of formulas B \ , B 2l . . . . B m is name a "proof" from the assumptions/hypothesis Ai, A 2 , . . . , A n if for each i: • Bi is one of the assumptions/hypothesis A 1 , A 2 , . . . , A n • or Bi is a variant of an axiom • or Bi is inferred (by the application of the modus ponens rule) from the major premise Bj and minor premise Bj, where j,k<i • or Bi is inferred (by the application of the substitution rule) from an anterior premise Bj, the replaced variable not appearing in A 1 , A 2 , . . . , A n Such a sequence of formulas, B m being the final formula of the sequence, is more explicitly named "proof of B rn " from the assumptions/hypothesis (or axioms) Ai, A 2 , . . . , A n , what we also write: Ai, A 2 , . . . , A n h B m (1.27) More explicitly a proof is a deductive argument for a mathematical statement. In the argument, other previously established statements, such as theorems, can be used. In principle, a proof can be traced back to self-evident or assumed statements, known as axioms. Proofs employ logic but usually include some amount of natural language which usually admits some ambiguity. In fact, the vast majority of proofs in written mathematics can be considered info @ sciences. ch 75/5785 4. Arithmetic EAME v3. 5-2013 as applications of rigorous informal logic. Purely formal proofs, written in symbolic language instead of natural language, are considered in proof theory. Remark Note that when we try to prove a result from a number of assumptions, we do not try to prove the assumptions themselves ! V W 76/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 1.2.4 Quantifiers We have to complete the use of the connectors of propositional calculus by what we name "quantifiers" if we wish to solve some problems. Indeed, the propositional calculus does not allow us to state general things about the elements of a set, for example. In that sense, propo- sitional logic is only part of the reasoning. The calculus of predicates on the contrary allows to formally handle statements such as "there exists an x such that \x has an American car]" or "for all x [if x is a dachshund, then x is small]". In short, we extend the composed formulas in order to assert existential quantifiers ( "there...") and universal quantifiers ( "for every..."). The examples we just gave involve a bit special proposals like "x has an American car." This is proposition with a variable. These proposals are in fact the application of a function to x. This function, is this that associates "x has an American car" with x. We will denote this function by "... has an American car" and we say that is a propositional function because it is a function whose value is a proposal. Or a "predicate" as we already know. The existential and universal quantifiers go hand in hand with the use of propositional functions. The predicate calculus is however limited in the existential and universal formulas. Thus, we prohibit ourselves to use sentences l ik e "there is an affirmation of x such that ...". In fact, we allow ourselves to quantify only "individuals". This is why predicate logic is named "first-order logic" because it uses variables as basic mathematical objects (while in the second-order logic they can also be sets). First -Order Logic: Second - Order Logic: Before turning to the study of the predicate calculus we define: Dl. The "universal quantifier": V (for all) D2. The "existential quantifier": 3 (exists) info @ sciences. ch 77/5785 4. Arithmetic EAME v3. 5-2013 ^Example: If any complex number is the product of a non-negative number and a number of modulus 1 we will write: \/z G C, 3p G Z > 0, 3w G C : (|w| = 1 A z = pu) (1-28) The order of quantifiers is critical to meaning, as is illustrated by the following two propositions: For every natural number n, there exists a natural number s such that s = n 2 . This is clearly true! It just asserts that every natural number has a square. The meaning of the assertion in which the quantifiers are turned around is different: There exists a natural number s such that for every natural number n, s = n 2 . This is clearly false! It asserts that there is a single natural number s that is at the same time the square of every natural number. A frequent question in physics and mathematics is to know if the universal quantifier has to be before of after the predicates they refer to. In fact, strictly in terms of formal logic, quantifiers are always at the beginning of any formula. However, almost no one gives a proof that is written in the formal language. Even simple proofs would be very long and unreadable. But anyone, regardless of what natural language they speak, will interpret a sentence in the formal language in the same way. The price for this clarity of course is readability. Natural languages, because of their inherent ambiguity, are subject to many more limitations. Obviously the proper usage of a formal notation or of a more informal one depends particularly on the context of presentation. It is essential to whom we communicate an idea and this should guide us to use a suitable level of formal notation. We use the sometimes the symbol 3! to say briefly: "there is one and only one". ^Example: A famous example is the way to explicit that the logarithm is a bijective function: \/x G M + , 3 \y G M+ : x = ln(y) (1.29) We will see now that the Proof Theory and Set Theory is the exact transcription of the principles and results of Logic (the one with a capitalized "L" ) 78/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 1.3 Predicate Calculus In mathematics courses (algebra, analysis, geometry, ...), we prove the properties of different object types (integer, real, matrices, sequences, continuous functions, curves, triangles, ...) . To prove these properties, we need of course that the objects on which we work are clearly defined (what is a set, what is a real, what is point, ...?). In first-order logic and, in particular, in proof theory, the objects we study are the formulas and proofs. We must therefore give a precise definition of what are these objects. The terms and formulas are the grammar of a language, oversimplified and calculated exactly to say what we want without ambiguity and without unnecessary detour. 1.3.1 Grammar Definitions (#11): Dl. The "terms" designate items for which we want to prove some properties (we will dis- cussed the latter much more in details further below): • In algebra, the terms refer to the elements of a set (group, ring, field, vector space, etc.). We also manipulate sets of objects (subgroups, subrings, subfields, etc.). The terms which will designate the objects are named "second-order terms". • In analysis, the terms refer most of time to real numbers (for example, if we put ourselves in functional spaces) or functions. D2. The "Formulas", are the properties of objects we study (we will discussed the latter also much more in details further below): • In algebra, we can write formulas to express that two elements commute, that a subspace is of dimension 3, etc. • In analysis, we will write formulas to express the continuity of a function, the con- vergence of a sequence, etc. • In set theory, formulas can express inclusion of two sets, membership of an element in a set, etc. D3. The "proof", enable to check if a formula is true. The precise meaning of this word will also need to be defined. More precisely, they are deductions under assumptions, they allow to "lead from truth to truth", the question of the truth of the conclusion is then returned to that of the hypothesis, which does not look at the logic but is based on the knowledge we have on things we talk about. info @ sciences. ch 79/5785 4. Arithmetic EAME v3. 5-2013 1.3.2 Language In mathematics we use, depending on the area, different languages that are distinguished by the symbols used. The definition below simply expresses that it is sufficient to only have to give the list of symbols to specify the language. Definition (#12): A "language" is the content of a family (not necessarily finite) of symbols. We distinguish three kinds of languages: symbols, terms and formulas. Remarks Rl. We use sometimes the word "vocabulary" or "signature" instead of the word "language". R2. We already know that the word "predicate" is used instead of the word "relation". We speak then of "predicate calculus" instead of "first-order logic"). V 1.3.2. 1 Symbols There are different types of symbols we will try to define: Dl. The "constant symbols" (see note below) ^Example: The neutral element n in Set Theory (see section Set Theory) D2. The "function symbols" or "functors". To each function symbol is assigned a strictly positive integer that we name her "ary": this is the number of arguments of the function arguments. If the arity is 1 (resp. 2, . . . , n), we say then that the function is unary (resp binary., ..., n-ary). ^Example: The binary functor of multiplicaton x or • in group theory (see section Set Theory) D3. The "relation symbol". Similarly to the previous definition, every relation symbol is associated with a positive or null integer (its arity) that corresponds to its number of arguments and we talk then of unary, binary, n-ary relation. ^Example: The relation = is a binary relation (see section Set Theory) 80/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic D4. The "individual variables". In what will follow we will give us an infinite set V of vari- ables. The variables will be recorded as it is traditional by the latin lowercase letters: x, y, z (possibly indexed: x 1: x 2 , x 3 ). D5. To this we should add the connectors V, A and quantifiers V, 3, 3! that we exten- sively discussed above, on which it is now useless to return. Remarks Rl. A constant symbol can be seen as a function symbol with 0 argument (arity zero). R2. We consider (unless otherwise stated) that each language contains the binary relation symbol = (read "equal") and the relation symbol with zero argument denoted _L (read "bottom" or "absurd") representing as we already know the value FALSE. In the description of a language, we will often omit to mention them. The symbol _L is often re- dundant. We can indeed, without using it, write a formula that is always false. However, it can represent the FALSE in a canonical way and therefore to write general proofs rules. R3. The role of functions and relations is very different. As we will see, the function symbols are used to construct the terms (of language objects) and the relation symbols to build formulas (the properties of these objects). 1.3.2.2 Terms The terms, we also say the "first order terms", are the objects associated with the lan- guage. Dl. Given C a language, the set T of terms on C is the smallest set containing the variables, the constants and stable (we do not go out of the set) by applying function symbols of C to the terms. D2. A "closed term" is a term that does not contain variables (by extension, only constants). D3. For a more formal definition, we can write: T 0 = {t} (1.30) where t is a variable or constant symbol and, for any k <E N: %+i = Tk U {/(£ i, . . . ,t n )\ti e 7^} (1-31) where / is obviously a function of arity n (let us recall that the arity is the number of function arguments). Thus, for each arity, there is a degree of set of terms. We have finally: T = US fceN (1.32) info @ sciences. ch 81/5785 4. Arithmetic EAME v3. 5-2013 D5 . We name "height" of a term t the smallest k such that t G Tk- This definition means that variables and constants are terms and that if / is a n-ary func- tion symbol and p, . . . , t n are terms then f(t \, ... ,t n ) is also a term itself. The set T of terms is defined by the grammar: T={V\S c \S f (T,...,T} v (1.33) This expression must be read as follows: a element of the set T we are defining is either an element of V (variable) or an element of S c (the set of symbols of constant), or the application of a n - ary function symbol / G Sf (constants or variable) of T. Caution: The fact that / is of the good arity is only implicit in this notation. Moreover, writing Sf(T , . . . , T) does not mean that all function arguments are identical, but simply that these arguments are elements of T. Remark It is often convenient to see a term (expression) as a tree, where each node is labeled with a function symbol (operator or function) and each sheet by a variable or constant. V V In what follows, we will almost always define concepts (or prove results) "by recurrence" on the structure or the size of a term. Definitions (#13): Dl. To prove a property P on the terms, it suffices to prove P for the variables and con- stants and to prove P(f(ti , . . . , t n )) from P(h ), . . . , P{t n ). We do then here a "proof by induction on the height of a term". It is a technique that we will find in the following chapters. Mathematical induction as an inference rule can be formalized as a second-order axiom. The axiom of induction is, in logical symbols: VP. [[P(0) A V(fc G N). [P(k) => P(k + 1)]] => V(n G N). P{n)] (1.34) In words, the basis P(0) and the inductive step (namely, that the inductive hypothesis P(k) implies P(k + 1)) together imply that Pin) for any natural number n. The axiom of induction asserts that the validity of inferring that P(n) holds for any natural number n from the basis and the inductive step. Induction can be compared to falling dominoes: whenever one domino falls, the next one also falls. The first step, proving that P(l) is true, starts the infinite chain reaction. 82/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic D2. To define a function $ based on the terms, it is enough to define it on the variables and constants and tell how we get $(/(£i, . . . , t n )) from . . . , $(/„). We do here again a "definition by induction on the height of a term". ^Example: The size (we also say the "length") of a term t (size denoted r(i)) is the number of function symbols occurring in t. formally: • t(x) = t(c) = 0 if x is a variable and c is a constant. * . . . , t n )) = 1 + E i<i<n r{ti ) where the 1 in the last relation represents the term / itself. 1.3.2.3 Formulas Definition (#14): A "well-formed formula WEF", often simply "formula" is a word (i.e. a finite sequence of symbols from a given alphabet) that is part of a formal language. A formal language can be considered to be identical to the set containing all and only its formulas. The formulas of propositional calculus, also named "propositional formulas", are expressions such as (A A (B V C )). An atomic formula is a formula that contains no logical connectives nor quantifiers, or equiva- lently a formula that has no strict subformulas. The precise form of atomic formulas depends on the formal system under consideration; for propositional logic, for example, the atomic formu- las are the propositional variables. For predicate logic, the atoms are predicate symbols together with their arguments, each argument being a term. Symbols and strings of symbols Well-formed formulas Theorems info @ sciences. ch 83/5785 4. Arithmetic EAME v3. 5-2013 Definition (#15): Formulas are built from "atomic formulas" using connectors and quantifiers. We will use the following connectors and quantifiers (which we already known): • Unary negation connector: -> • Binary connectors of conjunction and disjunction and implication: A, V, — » • Quantifiers: 3 which must be read "it exists" and V that must be read "for all" This notation of the connectors is almost (it should at least). It is used to avoid confusion between the formulas and the current language (metalanguage). Definitions (#16): Dl. Given £ a language, the "atomic formulas" of Fare the formulas of the form R(ti , . . . , t n where R is an n-ary relation symbol of £ of and ti , . . . , t n are terms of £. We denote by "Atom" all atomic formulas. If we denote by S r the set of relation symbols, we can write all terms related between them by the expression: Atom = S r (T, • • • , T) T e £, Vtj e T (1.35) The set T of formulas of the first order logic of £ is thus defined by the grammar (where a; is a variable): T = Atom|F A F\F V F\F — » F\F^F\3xF\WxF (1.36) where should be read: the set of formulas is the smallest set containing formulas and such that if F\ and F 2 are formulas then F\ V F 2 , etc. are formulas and can be related to each other. The reader must be careful to not to confuse terms and formulas, sin(x) is a term (func- tion), x = 3 is a formula. But sin(x) A (x = 3) is nothing: we can not, in fact, put a connector between a term and a formula (meaningless). Remarks Rl. To define a function $ based on formulas, we simply need to define <f> on atomic formulas. R2. To prove a property P on formulas, it suffices to prove P for the atomic formulas. R3. To prove a property P on the formulas, it is enough assume that the property holds for all formulas of size p < n and to prove the property for formulas of size n. V ) D2. A "sub-formula" of a formula (or expression) F is one of its components, verbatim a formula from which F is built. Formally, we define the set SF( F) of the sub-formulas F by: 84/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic • If F is atomic: SF(F) = {F} (1.37) • If F = Fi © F -2 (that is to say a composition) with © G {V, A, — »}: SF = (F) = {F} U SF(Fi) U SF(F 2 ) (1.38) • If F — ->F or Q 6 Ei with Q G {V, 3}: SF(F) = {F} U SF(Fi) (1.39) D3. A formula F of C uses only a finite number of symbols of C. This subset is named the "language of the formula" and is denoted by £(F). D4. The "size (or length) of a formula F", denoted by r(F) is the number of connectors or quantifiers occurring in F. Formally: • r(F) = 0 if F is an atomic formula • r(Fi © F 2 ) = 1 + t(F 1 ) + t(F 2 ) where once again ® G {V, A, -)•} • t(-iFx) = t(QxFi) = lr(Fi) with once again Q G {V, 3} D5. The "main operator" (we also say the "main connector") of a formula is defined as: • If A is atomic, so it has then it has no main operator •If A = -i B, the -i is the main operator of A • If A = B © C where once again © G {V, A, — )■} then © is the main operator of A •If A = Q xB where once again Q G (V, 3}, then Q is the main operator of A D6. Given F a formula. The set VL{F ) of free variables of F and the set V M (F) of dummies variables (or "linked variables") of F are defined by induction on r(F). An occurrence of a given variable is named "linked variable" or "dummy variable" in a formula F in a formula F, if in this formula a quantifier refers to it. Otherwise, we say we have a "free variable". Remark An occurrence of a variable x in a formula F is a position of the variable in the formula F. Do not confuse with the object that is the variable itself! V I / To clarify the possible free variables of a formula F, we write F[x i, . . . , x n ] . This means that free variables of F are among the variables x\, . . . , x n , verbatim if y is free in F, then is one of the x % but the x % do not necessarily appear in F. We can define the dummy or free variables more formally: (a) If F = R(t\, . . . , t n ) is atomic then VL(F) is the set of free variables appearing in the ti and we have then for dummy variables VM (F) = 0. (b) If F = Fi © F 2 where © G {V, A, — )■} : VL(F) = VL(F 1 ) U VL(F 2 ) then VM(F) = VM (Fx) U VM(F 2 ). info @ sciences. ch 85/5785 4. Arithmetic EAME v3. 5-2013 (c) If F = ->Fi then VL(F) = VL(F i) and VM(F) = VM{F 1 ). (d) If F = Q xF 1 with Q e {V, 3} : VL(F) = VL (FJ - {x} and VM(F) = VM(Fi) U {x}. ^Examples: El. Given F: Vx (x ■ y — y ■ x) then VL(F) = {y} and VM(F) = {x} E2. Given G: (Vx3 y(x ■ z = z ■ y)} A {x = z ■ z} then VL(G) = {x, z} and VM{G) = {x,y}. D7. We say that formulas F and G are "a -equivalent" if they are (syntactically) identical only after the renaming of their related variables. D8. A "closed formula" is a formula without free variables. D9. Given F a formula, x a variable and t a term. F\x := t] is the formula obtained by replacing in F all free occurrences of x by t, after possible renaming of linked occurrences in F which are free in t. Remarks Rl. We will notice in the examples seen previously that a variable can have both free occurrences and linked occurrences. So we do not always have VL{F ) D VM(F ) = 0. R2. We can not rename y into x in the formula \/y (x ■ y — y ■ x) of the previous example and get the formula Vx (x • x = x • x): the variable x would be then "captured". We therefore can not rename variables without precautions: we must avoid capture of free occurrences. V \A Proofs The proofs that we found in mathematical books or theoretical physics books are assemblies of mathematical symbols and sentences containing keywords such as: "So", "because", "if", "if and only if", " it is necessary", "just", "take an x such that", "therefore", "assume", "seek a contradiction", etc. These words are assumed to be understood by all in the same way, which is in fact, not always the case. In any work, the purpose of a proof is to convince the reader of the truth of a statement by show him the intellectual path that gives him the possibility to control himself the truth and rigour of the statement. Depending on the level of the reader, this prove will be more or less detailed or formal: something that can be considered obvious in a graduate course may not be in a undergraduate level course. In a homework, the corrector know that the result given by the student is (normally) true and he knows the proof of it. The student must prove (correctly) the required result. The level of detail 86/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic that the student must give will depends sometimes on the confidence possessed by the corrector: in a good copy, a "proof by an evident recurrence" will be accepted, while a copy where there previously had an "obvious" which was ... obviously false, will not pass! To properly manage the level of details, we should know what is a complete proof. This work of formalization has been done at the beginning of the 20th century only!! Several things may seem surprising: 1. There is only a finite number of rules: two for each of the connectors (and the equality) more three general rules. It was not at all clear before that a piori a finite number of rules we engough to prove all that is true. We will show this result (this is essentially the Completeness Theorem). The proof is not trivial at all. 2. These are the same rules for all the mathematics and physics: algebra, analysis, geome- try, etc. This means that we have managed to isolate what is general in reasoning! ! ! We will see later that a proof is a assembly of pairs (T, A), where T is a set of formulas (the assumptions) and A a formula (the conclusion). When we do the arithmetic, geometry or real analysis, we use, in addition to rules, assumptions that are named as we know "ax- ioms". These express the particular properties of objects that we manipulate (for details on the the concept of axioms Introduction section of the book). We prove therefore, in general, formulas using a set of assumptions, and this set can vary during the proof: when we say "suppose F and let us prove G", F is then a new hypothesis that we can use to prove G. to formalize this, we introduce the concept of "sequent": Dl. A "sequent" is a pair ThF where: (a) Gamma is a finite set of formulas that represents the assumptions that we can use. This set is also named the "context of the sequent". (b) F is a formula. This is the formula we want to prove. We say that this formula is the "conclusion of the sequent". The sign "h" must be read "thesis" or "prove that". D2. A sequent TF is said to be "provable" (or demonstrable) if it can be obtained by a finite application of rules. A formula F is provable if the sequent b F is provable. info @ sciences. ch 87/5785 4. Arithmetic EAME v3. 5-2013 1.4.1 Rules of Proofs Proofs rules are the bricks used to build demonstration steps. A formal demonstration is a finite (and correct!) assembly of rules. This assembly is not linear (not a suite) but a "tree." Indeed, we are often forced to make connections. We will present a choice of rules. We could have introduced other (instead of or in addition) that would give the same notion of provability. Those that have been chosen are "natural" and correspond to the arguments that we usually made in mathematics. In the common practice we use, in addition to the rules below, many other rules, but these can be deduced from previous ones. We name them "derived rules". It is traditional to write the root of the tree (the sequent conclusion) at the bottom, the leaves at the top: the nature is done as this... As it is also tradition to write on a sheet of paper, from up to down, it would not be unreasonable to write the root at the top and the leaves at the bottom . We must make a choice! A rule consists of: • a set of "premises" each is a sequent. There may be zero, one or more of them. • the conclusion sequent of the rule • a horizontal bar separating the premises (top) from the conclusion (bottom). On the right of the bar, we will indicate the name of the rule. ^Example: Thd^BThd fkfi (1.40) This rule has two premises (T h A — » and TEA) and a conclusion (T£>) and is denoted in an abbreviated under the form —> e . It can be read in two ways: • from bottom to top: if we want to prove the conclusion, it suffices by using the rule to prove the premises. This is what we do when we seek a proof. This corresponds to the "analysis". • from top to bottom: if we proved the premises, so we also proved the conclusion. This is what we do when we write a proof. This corresponds to the "synthesis". 88/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic For the proofs there is a finite number of 17 rules in number that we will define below: 1. Axiom: r,Ah a ax (1.41) From bottom to top: if the conclusion of the sequent is one of the hypothesis, then the sequent is provable. 2. Weakening: r h a r,BhA aff Explanations: (1.42) • From top to bottom: if we prove A under the assumptions E, adding other hypothe- ses can still prove A. • From bottom to top: there are assumptions that may not serve 3. Introduction of implication: r,Ah B rh b-aA ~Ai (1.43) From bottom to top: to prove that A B we assume A (that is to say, we add it to the assumptions) and we prove B. 4. Elimination of implication: r h A B TA A r h B (1.44) From bottom to top: to prove B, if we know a theorem of the form A -A B and if we can prove the lemma A — » B, it suffices to prove A. 5. Introduction to the conjunction: rhd rhfi A ThAAB A (1.45) From bottom to top: to prove A A B, it suffices to prove A and prove B. 6. Elimination of the conjonction: \-aab hA or h A AB h B (1.46) From top to bottom: from A A B, we can deduce A (left elimination) and B (right elimi- nation). info @ sciences. ch 89/5785 4. Arithmetic EAME v3. 5-2013 7. Introduction to the disjunction: TEA „ T \- B d V? or V d rhdvB * rhdvB 4 (1.47) From bottom to top: to prove A V B, it suffices to prove A (left disjunction) or prove B (right disjunction). 8. Elimination of the disjunction: rh avb r,A\-c r,Bhc r f c V e (1.48) From bottom to top: if we want to prove C and we know we have A F B, it is enough to prove in first time by assuming A, and in a second time by assuming B. This is a case-based reasoning. 9. Introduction of the negation: T,A F T_ r i — iA ^ v From bottom to top: to show ->A, we assume A and we prove the absurd (_L). 10. Eliminiation of the negation: r f -<A T F A TFT J From top to bottom: if we proved ->A and A, then we proved by absurdity (_L). 11. Classic absurdity (reductio ad absurdum): (1.49) (1.50) r, AFT F A (1.51) From bottom to top: to prove A, it suffices to prove the absurdity by assuming ->A. This rule is equivalent to say: A is true if and only if it is false that A is false. This rule is not obvious: it is necessary to prove some results (there are results we can not prove if we do not have this rule). Contrary to many others, this rule may also be applied at any time. We can, in fact, always say: To prove A, I suppose ->A and I will seek for a contradiction. 12. Introduction of the universal quantifier: T F A x is not free in the formulas of T F MxA V, (1.52) From bottom to top: to prove V.x A, it suffices to show A doing no assumption about x. 90/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 13. Elimination of the universal quantifier: r b VxA w rb A[x := f] V< (1.53) From top to bottom: from Vx A, we can deduce A[x : Of] for any term t. What we can also say under the form: if we proved A for all x, then we can use A with any object t CO- 14. Introduction of the existential quantifier: r b A[x := t] T b 3 xA (1.54) From bottom to top: to prove 3x A, it suffices to found an object (verbatim a term f) for which we know how to prove A[x := t]. 15. Elimination of the existential quantifier: T b 3 xA r, A b C x is not free in the formulas of T, neither in C . r b c : (1.55) From bottom to top: we prove that there is indeed a set of assumptions such that 3x A and hence this result as new hypothesis, we prove C. This formula C inherits then from the formula 3x A and therefore x is not free in C because it already was not in 3.x A. 16. Introduction of equality: r b t = t (1.56) From bottom to top: we can always prove t = t. This rule means that equality is reflexive (see section Operators). 17. Elimination of equality: T b A[x := t] T b f = u T b A[x := u (1.57) From top to bottom: if we prove that T b A[x := t] and t = u, then we have prove A[x := u\. This rule expresses that equal objects have the same properties. We notice however that the formulas (or relations) t = u and u — t are not, formally, identical. We will have to prove that equality is symmetric (we will benefit also to prove on the way that equality is transitive). Let us see now three example by introducing them in the form of theorems as it should be in proof theory! info @ sciences. ch 91/5785 4. Arithmetic EAME v3. 5-2013 Theorem 4.1. The equality is symmetric (a little bit not trivial but quite good to begin with the subject): Proof 4.1.1. X\ = x 2 E Xi = X\ Xi = X 2 E X\ = x 2 ax xi = x 2 E x 2 = x\ ~Ai E Xi = x 2 -± x 2 = Xi w x 2 h Vxi, x 2 (xi = x 2 ->■ X 2 = Xi) (1.58) From top to bottom: we introduce the equality — t and prove from the assumption x,\ = x 2 the formula x\ = x | . At the same time, we define the axiom as what X\ = x 2 . Then from these premises, we eliminate the equality = e by substituting the terms so that from the assumption X \ = x 2 (from the axiom) we get x 2 = x \ . Then, the elimination of equality automatically implies without assumption that x\ — x 2 — > x 2 — x 1. Therefore, we simply insert the universal quantifier for each variable (ie twice) without any assumption to achieve that equality is symmetric. Theorem 4.2. The equality is transitive (that is to say if x\ By denoting F the formula (aq = x 2 ) A {x 2 = x 3 )\ □ Q.E.D. x 2 and x 2 = x 3 then x\ = £3). Proof 4.2.1. F h X\ = x 2 A x 2 = £3 F h X\ — x 2 -ax A 9 ' x e F \- X \ = x 2 A £ 2 = £3 F h £ 2 = £3 ax -A® F h £1 = £3 (1.59) h (£1 = £ 2 A £ 2 = £3) -A £1 = £3 -V,- x 3 V£i, £2, £ 3 ((^i = £2 A £ 2 = £3) "A £1 = £3) What do we do here? We first introduce the formula F twice as axiom to "dissect" it latter left and right (we do not introduce the equality supposed already introduced as a rule). Once done, we eliminate on the left and on the right the conjunction on the formula to work on left and right terms only and we introduce the equality of the two terms which makes that from the formula we have the transitive equality. It follows without any assumption that automatically implies that equality is transitive and finally we say that this is valid for any value of the different variables (if the formula is true, then equality is transitive). □ Q.E.D. And now last big example always in the form of a theorem: Theorem 4.3. Any involution is a bijection (see section Set Theory). Proof 4.3.1. Let / be a unary function symbol (with one variable), we write (for the details see the section of Set Theory): • Inj [/] the formula: V£, y : f{x) = f(y) -A £ = y (1.60) which means that / is injective. 92/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic • Surj [/] the formula: Vy,3x: f{x) = y (1.61) • Bij [/] the formula: Inj [/] A Surj[/] (1.62) • Inv[/] the formula: Vx : f{f{x)) = x (1.63) which means thatf is an involution (we also writh this / o / = Id that this is to say that the composition of / is the identity). We would like to know if: Inv[/] h Bij [y] (1.64) We will present (trying this to be done as easy as possible) this proof in four (!!!) different ways: traditional (informal), classic (pseudo-formal), formal in tree and formal in-line. • Traditional method (informal): We must prove that if f is involutive then it is bijective. So we have two things to prove (and both must be satisfied simultaneously): the function is injective and surjective. 1. So we prove first that involution is injective. We assume for this, because / is an involution it therefore injective, such that: V.t, y : f{x) = f{y) (1.65) implies that: x = y (1.66) However, this assumption automatically comes from the definition of involution that: Vx : f(f(x)) = x,Wy : f(f(y)) = y (1.67) and to the application of / to the relation: f{x) = f(y ) (1.68) (thus three equalities so far) such that: /(/(*)) = f(f(y)) (1.69) we therefore have: x = y (1.70) info @ sciences. ch 93/5785 4. Arithmetic EAME v3. 5-2013 2. Let us prove that involution is surjective: if it is surjective, then we must have: Vy3x\ f(x) = y (1.71) But, let us define the variable x by definition of the involution itself: x:=f(y) (1-72) as y — f(x)..., and a change of variable after we get: f(f(y))=y (i-73) and therefore surjectivity is ensured. • Pseudo-formal method: We take again the same and we inject in it the rules of proof theory: We must show that / is involutive and therefore bijective. So we have two things to show (A*) (and both must be satisfied simultaneously): that the function injective and surjective: Inj[/] A Surj[/] (1.74) 1 . Let us first prove that involution is injective. We assume for this, since / is involutive and therefore injective, that: (Vi) Vx,y : f(x) = f(y) (1.75) implies: (— >*) x = y (1.76) However, this assumption automatically comes from the definition of involution that: (V e , ax) Vx f(f(x)) = x,Vy f(f(y)) = y (1.77) and from the application of / to the relation: f(x) = f(y) (1.78) (therefore three equalities — e x 2) such that: f(f(x)) = f(f(y)) (1.79) We therefore have: x = y (1.80) 2. Let us prove that involution is surjective. If it is surjective, then we must have: Vy3x : f(x) = y (Vi) (1.81) Now, we define the variable x by definition of the involution itself: (3j) x = f(y) (1.82) since y — f(x)...., after a change of variables we get: f(f(y)) = y (1-83) and therefore: surjectivity is assured. (V e ,ax) (1.84) 94/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Formal tree method: Let us do this now with the graphical method that we have presented above. 1. Let us prove that involution is injective: For this we must prove first that: f(x) = f(y) f{f{x)) = f(f(y )) (1.85) Therefore: Inv[/] h Vy(f(f{y)) = y ) ax V P In v[/] I" {f{y) = x}[y := f{x)\ _ f(x) = f(y) h /(x) = f(y ) f(x) = f(y ) I- {f(x) = y}[x := f(y )] -ax =e (a) (1.86) In v[/] H Vx(f(f(x)) = x -ax V, In v[/] l~ {f(x) = y}[x := f(y)\ f(x) = f(y) h f(x) = f (y) f(x) = f(y ) h {f(y) = x}[y := f(x)] That bring us to write: -ax (a)(b) f(x) = f(y) h /(/(*)) = /(/(y)) =e (C) =e (b) (1.87) (1.88) Remark The latter relation is abbreviated = c and named (as other existing) "derived rule" because it is an argument that is often made during proofs and a little time consuming to develop each time ... Therefore: Inv[/] In v[/] ax -V, Inv[/] ->■ Inv[/ ax In v[/] l~ /(/(g)) = x _ In v[/] h /(/(y)) = y = f(f(y )) I- x = y V R =e (C) Inv[/],/(x) = f(y) x = y In v[/] h /(or) = /(?/) ^x = y Inv[/] h Inv[/] 2. Let us prove now that involution is surjective: = e (d) (1.89) — L; V,. ax It follows: Inv[/1 h Vx{/(/(x)) = x} In v[/] l~ {/(x) = y}[x := /(y)] Inv[/] t~ 3x{f(x) = y} Inv[/] h Surj [/] (d)(e) V, -V, (e) Jra;[/] h Jnj[/] A Surj[f] A i (1.90) (1.91) info @ sciences. ch 95/5785 4. Arithmetic EAME v3. 5-2013 • Formal in-line method: We can do the same thing in a slightly less... wide form ... and more ... tabbed (it is no less indigestible): Inv[/] b Bij [/] V i (1.92a) 1 : = Inv [/ ] b Inj [/ ] 2:=Inv[/]bSurj[/] (1) Inv[/] b Inj[/] Vi In v[/] b f(x) = f(y ) -> x = y ->* hv[/],/(a;) = /(j/) b x = y (i) Inv[/] b /(/(x)) = x V e Inv[/] b Inv[/] ax (**) Inv [/] I" /(/(?/)) =1/ V e Inv[/] b Inv[/] ax (in) f(x) = f(y) b /(/(x)) = /(/(?/)) (l')/0b> = /(//) •" /(» = /(?/) ax (2) Inv[/] b Surj[/] Vi Inv[/] b 3 x{/(x) = y} 3j Inv[/] b {/(x) = y}[x := f(y )] V e Inv[/] b Vx{/(/(x)) = x} ax (1.92b) x2(i)(ii)(m) (1.92c) □ Q.E.D. ☆☆☆☆☆ 64 votes, 80.94% Version: 3.1 Revision 5 I Last update: 2015-09-06 16:33 96/5785 info @ sciences. ch Numbers T He basis of mathematics, apart the reasoning (see section Proof Theory), is undoubt- edly to ordinary people: arithmetic. It is therefore mandatory that we make a stop on it to study its origin, some of its properties and consequences. The numbers, like geometric figures are the basis of Geometry, are the basis of Arithmetic. These are also the historical basis because mathematics probably started with the study of these objects, but also the educational foundation, because it is by learning to count that we enter in the world of mathematics. The history of numbers, also sometimes called "scalar" is far too long to be told here, but we can only advise you one of the best book on the subject: The Universal History of Numbers ( 2,000 pages in three volumes) Georges Ifrah, ISBN: 2221057791. But here’s a little flange of this latter which seems fundamental to us: Our current decimal system, on base 10, uses the digits 0 to 9, called "Arabic numbers", but the fact of Indian origin (Hindus). The first numbers seems to have been created in the third century BC in India by Brahmagupta, an Indian mathematician, he created the figures in Devanagari. Indeed, the Arabic numbers (of Indian origin...) in the table below are the first line and we see that they are significantly different from the "Indian numbers" of the second line: * 1 r r l O 1 V A 0 1 2 3 4 5 6 7 8 9 Figure 4.2 - Indo-Arabic numbers You have to read this table as following from left to right: 0 "zero", 1 "one", 2 "two", 3 "three", 4 "four", 5 "five", 6 "six", 7 "seven", 8 "eight, 9 "nine". This system is much more efficient than the Roman numerals (try doing a calculation with Roman notation system you will see...). It is commonly accepted that these numbers were introduced in Europe only about the year 1,000. Used in India, they were transmitted by Arabs to the Western world by the Pope Gerbert of Aurillac during his stay in Andalusia at the end of the 9th century. 4. Arithmetic EAME v3. 5-2013 Remark The French word "chiffre" (number) is a corruption of the Arabic word "sifr" meaning "zero". In Italian, "zero" is "zero", and seems to be a contraction of "zefiro", we again see here an Arabic root but the "zero" could also be of Indian origin... So the words "chiffre" and "zero" have the same origin. V / The early use of a numerical symbol for the "nothing" in the sense of "no amount", i.e. our "zero" is because the Indians used a system called "positional system". In such a system, the position of a digit in the writing of a number expresses the power of 10 and the number of times it occurs ... and the absence of a position in this system arise from huge proofreading problems and could lead to large errors in calculations. The revolutionary and simple introduction of the concept of "nothing" allowed a proofreading without error of numbers. The absence of a power is denoted by a small circle...: the zero. Our current system is thus the "decimal and positional system". ^Example: ICf I0 1 10° i 1 1 324 = 3x100 + 2x10 + 4x1 1 I units tens hundreds — — - Figure 4.3 - Description of decimal and positional system The number 324 is written from left to right as three hundred: 3 times 100, two tens: 2 times 10 and four units: 4 times 1. Thus a "decimal number" is thus a number that has a finite writing in base 10. We sometimes see (and this is recommended) a thousands separator represented by a coma in United States (put all three numbers from the first from the right for the whole numbers). Thus, we write 1,034 instead of 1034 or 1,344,567,569 instead of 1344567569. Thousand separators permits to quickly quantify the magnitude of the read numbers. So: • If we see only one coma we know that the number is about thousands • If we see two apostrophes we know that the number is about millions 98/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic • If we see three apostrophes we know that the number is about billions • etc. and so on... also with decimals this gives: Figure 4.4 - Scale representation of the positional system In fact, any integer other than the unit can be taken as the basis of a numbering system. We have for example the binary, ternary, quaternary, ..., decimal, duodecimal numbering systems which correspond respectively to the bases two, three, four, ..., ten, twelve. A generalization of what has been seen above, can be written as follows: Any positive integer N can be represented in a base b as a sum, where each coefficient a, are multiplied by their respective weight b\ Such as: A — a n -ib n 1 + 2 & n 2 + ... + ci±b^ + cio^° ( 2 . 1 ) More elegantly written: n— 1 N = J2 (2.2) with di E [0, b — 1] and bi E [1, b n x ] info @ sciences. ch 99/5785 4. Arithmetic EAME v3. 5-2013 2.1 Digital Bases To write a number in base 6 system, we must first adopt b characters for represent- ing the b first numbers for example in the decimal system: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. These characters are as we already defined them, the "digits" that we pronounce as usual {zero, one, two, three, four, five, six, seven, eigth, nine}. For the written numbers, we make this convention that a digit, placed to the left of another represents the order units immediately above, or b times larger. To take the place of units that may be lacking in certain orders, we use the zero "0" and consequently, the number of digits may vary. Definition (#17): For the spoken numbers, we agree to name "single unit", "ten", "hun- dred", "thousand", etc., units of the first, second, third, fourth order, etc. Thus the num- bers 10, 11, ..., 19 will be readen in the same way in all numbering systems. The numbers la, 16, aO, 60, ... will be readen ten-a, ten-6, a-ten, 6-ten, etc. Thus, the number 5b6a71c will be readen: five million be-hundred sixty-a thousand seven hundred ten-c This small example is relevant because it shows the general expression of the spoken language we use daily is intuitively in base ten (fault of our education). 100/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Remarks Rl. The rules of mathematical operations defined for numbers written in the decimal system are the same for numbers written in any numbering system. R2. To quickly operate in any numbering system, it is useful to know by heart all sums and products of two numbers of a single digit. R3. The decimal seems has its origin in the fact that humans being have ten fingers. V 1 / Let’s see how we convert a numbering system in another one: ^Examples: El. In base ten we have seen above that 142, 713 will be written as: 142, 713i 0 = 1 • 10 5 + 4 ■ 10 4 + 2 • 10 3 + 7 • 10 2 + 1 • 10 1 + 3 • 10° (2.4) E2. The number 0110 that is in base two (binary base) would be written in base 10: 0110 2 = 0 • 2 3 + 1 • 2 2 + 1 • 2 1 + 0 ■ 2° = 6i 0 (2.5) and so on... The reverse operation is often a little trickier (for example the case of the binary base): info @ sciences. ch 101/5785 4. Arithmetic EAME v3. 5-2013 102/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 2.2 Type of Numbers Now that we know that number is a mathematical object used to count, measure and label it must be know that it exists in mathematics a wide variety of numbers (natural, rational, real, irrational, complex, p-adic quaternions, transfinite, algebraic, constructibles, etc.) since any mathematician may at leisure create its own numbers just by defining axioms (rules) for the manipulating them (see section Set Theory). However, there are a few of them that we find much more often than others through this book and some that serve as basic construction for others and which that should be defined sufficiently rigorously (without going to the extremes) in order to know what we will talk about when we will use them. 2.2.1 Natural Integer Numbers The idea of "integer" (the numbers for which there are no decimals) is the fundamental concept of mathematics and comes at the view of a group of objects of the same types (a sheep, another sheep, yet another sheep, etc.). When the amount of objects in a group is different from that of another group when the speak about a group that is numerically higher or lower regardless of the type of objects in these groups. When the amount of objects of one or multiples groups is equivalent, then we speak about "equality". To each single object the number "one" or "unit" denoted by "1" in the decimal system will be used. To form groups of objects, we can operate as follows: to an object, add another object, then another, and so on... each of of the clusters, from the point of view of its community, is charac- terized by a number. It follows from this that a number can be regarded as representing a group of units (single items) such that each unit corresponds to one single object of the collection. Definition (#18): Two numbers are said to be "equal" if each of the units of one we can match a unique unit of the other and vice versa (in a bijective way as seen in the section of Set Theory). If this does not hold true when we talk about "inequality". Let us take an object, then another, then to the formed group add again an object and so on. The groups thus formed are characterized by numbers which, taken in the same order as the groups successively obtained, are the "natural sequence N", also sometimes named "whole numbers", and denoted by: ¥={ 0 , 1 , 2 , 3 ,...} (2.8) To be unambiguous about whether 0 is included or not, sometimes an index (or superscript) is added in the former case: N* = N 1 = {1,2,3,...} (2.9) info @ sciences. ch 103/5785 4. Arithmetic EAME v3. 5-2013 Remark The presence of the 0 (zero) in our definition of N is debatable since it is neither positive nor negative. That is why in some books you will find a definition of N without the 0. V / The components of this natural set can be defined by (we own this definition to the mathemati- cian Frege Gottlob) the following the properties (having read first the section on Set Theory is strongly recommended...): PI. 0 (read "zero") is the number of elements (defined as an equivalence relation) of all sets equivalent to (in bijection with) the empty set. P2. 1 (read "one") is the number of elements of all sets equivalent to the set whose only element is 1 . P3. 2 (read "two") is the number of elements of all sets equivalent to the set whose only element are 1 and 2. P4. In general, an integer is the number of elements of all sets equivalent to the set of integers preceding it! The construction of the set of natural numbers is made of the most natural and consistent man- ner. Natural numbers get their name from what they were, in the beginnings of their existence, to count quantities and things of nature or intervened in human life. The originality of this set lies in the empirical way he has been built since it does not actually the result of a mathematical definition, but more by awareness of the human by the concept of countable quantity, of number and operations that reflect the relations between them. The question about the origin of N is therefore the question of the origin of mathematics. And since thousands of years debates confronting the thoughts of the greatest philosophical minds have attempted to elucidate this deep mystery as to whether mathematics is a pure creation of the human mind or whether the man has only rediscovered a science that already existed in nature. Besides the many philosophical questions that the set of Natural numbers can generate, it is nonetheless interesting from an exclusively mathematical point of view. Because of its structure, it has remarkable properties that can be very useful when we practice some given reasoning or calculations. The sequence of natural numbers is unlimited (see section Theory Of Numbers) but countable (we will this property in details below), because in a group of objects that is represented by a number n, it will be enough to add an object to get another group that will be defined by the integer n + 1. Definition (#19): Two integers that differs from a single positive unit are said to be "consecu- tive". 104/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 2.2. 1.1 Peano axioms During the crisis of foundations of mathematics, mathematicians have obviously sought to axiomatize the set N and we own the actual axiomatisation to Peano and Dedekind. The axioms of this system include the symbols < and = to represent the relations "smaller than" and "equal to" (see section Operators). They include also the symbols "0" for the number zero and s to represent the "successor" number. In this system, 1 is denoted by: 1 = s(0) (2.10) named "successor to zero" and 2 is denoted by: 2 = s(s(0)) = s(l) (2.11) The Peano axioms that builds N are (see section of Proof Theory for details on some of the symbols use below): A1 . 0 is a natural number (this permits Y- to be not empty). A2. Every natural number n has a successor, denoted by s(n). Therefore s is an injective application (see section Set Theory), that is to say: Vx,y s(x) = s(y) x = y (2.12) That is to say that if two successors are equal, they are the successors of the same number. A3. The successor of a natural number is never zero (therefore N has a first element): Vx -i(s(x) = 0) (2.13) A4. If we prove a property (p that is true for x and its successor s(x), then this property is true for any x (axiom of recurrence"): (<p(x) =>- <p(s(x))) =>• Vx (p(x) (2.14) So the set of all the numbers satisfying the four above axioms is denoted by: So the set of all the numbers satisfying the four above axioms is denoted by: N = {0,l,2,3,..,n,...} (2.15) Remark The Peano axioms allow to build very rigorously the two basic operations of arithmetic in N that are addition and multiplication (see section on Operators) and so all the other sets that we will see later (subtraction in N can not be applied because it can give negative numbers). \ / info @ sciences. ch 105/5785 4. Arithmetic EAME v3. 5-2013 2.2.1.2 Odd, Even and Perfect Numbers In arithmetic, study the parity of an integer, its determiner if this integer is or not a multiple of 2. An integer multiple of 2 is an even integer, the others are odd integers. Definitions (#20): Dl. The numbers obtained by counting by step of 2 from zero (i.e.. 0, 2, 4, 6, 8, ...) in the set of natural integer numbers N are named "even numbers". The n th even number is obviously given by the relation: E: 2 n = n + n VN (2.16) D2. The numbers we get by counting by step of 2 starting from 1 (i.e.. 1, 3, 5, 7, ...) in the set of natural integer numbers N are named "odd numbers". The (n + l) th even number is almost obviously given by the relation: O: 2n + 1 = E VN (2.17) Remarks We name "perfect numbers", numbers equal to the sum of their integer divisors strictly smaller than themselves (concept we will see in detail later) such as:6 = l + 2 + 3 and 28 + 1 = 2 + 4 + 7 + 14. V / 2.2.1.3 Prime Numbers Definition (#21): A "prime number" is an integer with exactly two positive divisors (these divisors are both: "1" and the number itself). In the case where there are more than two dividers it is named a "composite number". The property of being prime (or not) is named "primality". The study of prime numbers is a huge subject in mathematics (see for a small example the section of Number Theory or of Cryptography). There are books of thousands of pages on the subject and probable hundreds of research article per month even nowadays. Most theorems are largely out of the study of the site book (and out of the interest of its main author...)! Here is the set of prime numbers less than 1000: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 79, 83, 89, 97, 101, 181, 191, 193, 197, 283,293,307,311, 409,419, 421,431, 523, 541, 547, 557, 647,653,659, 661, 773, 787, 797, 809, 106/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997 The whole set of prime numbers is sometimes denoted by P. Remark Note that the primes numbers set does not include the number "1" because it has a only a single divider (himself) and not two as is the definition. V / We can ask ourselves if there are infinitely many prime numbers? The answer is YES and here is a proof (among others) by contradiction. Proof 4.3.2. Suppose that there is a finite number of prime numbers that would be denoted by: Pl,P2,-,Pn (2-18) We create a new number from the product of this prime number to which we add "1": N = {PlP2-~Pn) + 1 (2.19) According to our initial hypothesis and the fundamental theorem of arithmetic (see section Number Theory) the new number N should be divisible by one of the existing prime p % such that we can write: N = q- Pi (2.20) where q is an integer. We can make the division: jpiP2-Pn) + 1 jg PlP2-Pn | 1 ^ ^ Pi Pi Pi The first term is simplified as p t is in the product. Let us note the resulting integer E: q = E + - (2.22) Pi But, q and E are integers, so 1 fp r should be an integer. But p, is by definition greater than 1. So 1 1 Pi is not an integer and so is also q. Then there is contradiction, and we can conclude that the prime numbers are not finite but are infinite. □ Q.E.D. Remarks Rl. The product p n = p\P 2 ---Pn of the indexed prime numbers < n is named the "n-th primorial". R2. We send the reader to the section Cryptography of the chapter on Theoretical Com- puting (or Number Theory section of the chapter Arithmetic) for the study of some re- markable properties of prime numbers including the famous Euler 0 function (also named "indicator function") and a 20th-21th century industrial application of prime numbers. V / info @ sciences. ch 107/5785 4. Arithmetic EAME v3. 5-2013 2.2.2 Relative Integer Numbers The set of natural integers N has a few issues that we did not set out earlier. For example, sub- tracting two numbers into N does not always have a result in N (negative numbers not existing in this set). Other issue... dividing two numbers in N also does not always have a result in N (fractional numbers - rational or irrational - not existing in this set). We then say in the language of set theory that: the substraction and division is not an internal operation of N. We can first resolve the problem of subtraction by adding to the set of natural numbers N, negative integers (revolutionary concept for those who where behind this concept at their time!) to get the set of "relative integers" denoted by Z (for "Zahl" from German, meaning "Number"): Z = -3, -2, -1,0, 1,2,3,... (2.23) The set of natural integers is therefore included in the set of relative integers. This is what we denote by (see section Set Theory): and we have by definition (it is a notations to be learned! ! !): Z + = Z>o = {n e Z | n > 0} = N* Zq = Z> 0 = {n e Z | n > 0} = N Zq = Z <0 = {nG Z | n < 0} (2.25) Z* = Z^o = {n G Z | n ^ 0} originally created to make the natural numbers an object that we name a "group" spt Thpnrvt rp.1ntivp.1v tn 1 hp 'vldition. This set was originally created to make the nature (see section Set Theory) relatively to the addition. Definition (#22): We say that a set A is a "countable set", if it is equipotent to N. That is to if there is a bijection (see section Set Theory) of S on N. Thus, roughly said, two equipotent s have the same number of elements in the meaning of their cardinal (see section Set Theory) at least the same infinity. The purpose of this concept is to understand that the sets N and Z are countable. Proof 4.3.3. Let us show that Z is countable by writing: x 2 k = k and x 2 k+i = for any integer k > 0. This gives the following ordered list: 0,-1, 1,-2, 2, -3, 3,... of all relative integers from natural integers only! = -k- 1 ( 2 .: □ Q.E.D. 108/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 2.2.3 Rational Numbers The set of relative integers Z also still has an issue. Dividing two numbers in Z also does not always have a result in Z (fraction numbers - rationnal or irrational - not existing in this set). We then say in the language of set theory that: the division is not an internal operation of Z. We can thus define a new set that contains all the numbers which can be written as a "fraction" that is to say the ratio of a dividend (numerator) and a divider (denominator). When a number can be written in this form, we say that it is a "fractional number": / MnmArcitnr ( 2 . 28 ) \ Denominator A fraction can be used to express a part or fraction of something (of an object, of a distance, of a land, of an amount of money, of a cake...). To better understand rational number (fractions) let us consider two individuals: Andy and Bobby that bot love pizza. On Monday night, they share a pizza equally. How much of the pizza does each one get? Are you thinking that each boy gets half of the pizza? That’s right. There is one whole pizza, evenly divided into two parts, so each boy gets one of the two equal parts. In math, we write - to mean one out of two parts: Figure 4.6 - A 1/2 fraction example (source: OpenStax) On Tuesday, Andy and Bobby share a pizza with their parents, Fred and Christy, with each person getting an equal amount of the whole pizza. How much of the pizza does each person get? There is one whole pizza, divided evenly into four equal parts. Each person has one of the info @ sciences. ch 109/5785 4. Arithmetic EAME v3. 5-2013 Figure 4.7 - A 1/4 fraction example (source: OpenStax) On Wednesday, the family invites some friends over for a pizza dinner. There are a total of 12 people. If they share the pizza equally, each person would get — of the pizza: Figure 4.8 - A 1/12 fraction example (source: OpenStax) By definition, the "set of rational numbers" is given by: Q = j^l (p,q) e %,q ± oj (2.29) In other words, any rational number is any number that can be expressed as the quotient or fraction p/q of two integers, p and q, with the denominator q not equal to zero. Since q may be equal to 1, every integer is a rational number. We also assume as obvious that: NcZcQ (2.30) The logic of the creation of the set of rational numbers Q is similar to that of relative integers Z. Indeed, mathematicians wanted to make of the set relative numbers Z a "group" with respect to the law of multiplication and division (see section Set Theory). Moreover, contrary to the intuition of most people, the set of natural integers N and rational numbers Q are equipotent. We can convince ourselves of this equipotence by ranking ,as Cantor did, rational numbers in a first time as follows: 110/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 1/1 > 2/1 3/1 — > 4/1 5/1 — »• 6/1 7/1 s s s s' 1/2 2/2 3/2 4/2 5/2 6/2 7/2 1 s s s s' 1/3 2/3 3/3 4/3 5/3 6/3 7/3 Z / ' s s s s' 1/4 2/4 3/4 4/4 5/4 6/4 7/4 1 ^ ' s s s s 1/5 2/5 3/5 4/5 5/5 6/5 7/5 X / ' s s s s' 1/6 2/6 3/6 4/6 5/6 6/6 7/6 s s s s' 1/7 2/7 3/7 4/7 5/7 6/7 7/7 Figure 4.9 - Cantor diagonal method This table is constructed so that each rational number appears only once (in the sense of its decimal value) by diagonal hence the name of the method: "Cantor diagonal". If we eliminate from each diagonal the rational numbers that appear more than one time (the "equivalent fractions") in order to keep only those who are irreducible (i.e. those with the greatest common divisor of the numerator and denominator is equal to 1), then we can with this distinction define an application / : N =>• Q that is injective (two distinct rational numbers have distinct ranks) and surjective (at any place will be written a rational number). The application / is therefore bijective: N and Q are then effectively equipotent! The definition a little bit more rigorous (and therefore less funny) of Q from Z is as follows (it is interesting to see the notation used): On the set Z x Z {0}, which should be read as the set constructed from two relative integer whose zero is excluded from the second one, we consider the relation R between two relative pairs of integers defined by: (a, b)R(a', b ' ) ab' = ab (2.31) We then easily verify that R is an equivalence relation (see section Operators) onZ x Z \ {0}. The set of equivalence classes for this relation R denoted then (ZxZ \ {()}) / R is by definition Q. That is to say that we write therefore more rigorously: Q = (ZxZ\{0 })/R (2.32) The equivalence class (a, b) e Z x Z \ {0} is explicitly denoted by: info @ sciences. ch 111/5785 4. Arithmetic EAME v3. 5-2013 in accordance with the notation that everyone is accustomed to use. We easily check the addition and multiplication operations that were operations defined on Z pass without problems to Q by writing: a p b q a ■ p b ■ q and a p T + - b q aq + bp bq (2.34) Moreover, these operations provide Q with the structure of a body (see section Set Theory) with - as neutral element for the addition and - as neutral element for the multiplication. Thus, any non- zero element of Q is reversible, in fact: a b ab 1 b a ab 1 what is written also more technically: (ab, ab)R( 1, 1) (2.35) (2.36) Remark Even if we want to define Q as the being the set Z x Z \ 0 where Z represents the numerators and Z \ 0 the denominators of the rationals, this is not possible because otherwise we would for example (1, 2) ^ (2, 4) while we expect for an equality. Hence the need to introduce an equivalence relation which enables us to identify, to return to the previous example, with (1, 2) and (2, 4). The relation R that we have defined does not fall from heaven, indeed the reader who handled the rational so far without ever having seen their formal definition knows that: - = — <S=> ab' = a'b (2.37) b b' It is therefore almost natural to define the relation R as we have done. In particular, 1 2 regarding the above example, - = - because (1, 2)R(2, 4) and the problem is solved. \ J In addition to the historical circumstances of its establishment, this new entity (set) is distin- guished from relative numbers because it induces the original and paradoxical concept of par- tial quantities. This notion that a priori does not make sense, find its place in the mind of man thanks to the geometry where the idea of fraction of length, of proportion are illustrated more intuitively. 112/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 2.2.4 Irrational Numbers It can seem obvious to present irrational numbers before real number (see further below) but this can be explained by the fact of this is the order of the discovering in the human history and therefore is seems more pedagogical to us to present them in this order. So the set of rational Q is limited and sadly not sufficient too. Indeed, we may think that all mathematical computation with commonly known operations are reduced to this set but it is not the case! ^Examples: El. Let us calculate the square root of two which we denote y/2 (thing to Pythagorean theorem with a triangle of side 1 and 1 then the third one is of size y/2). Suppose it is a rational root. So if this is truly a rational, we should be able to express it as a/b, where by the definition of a rational a and b are integers with no common factors. For this reason, a and b can not both be even numbers. There are three remaining possibilities: 1 . a is odd (then b is even) 2. a is even (then b is odd) 3. a is odd (then b is odd) By squaring, we have: V2=j (2.38) b That can be written: 2 b 2 = a 2 (2.39) Since the square of an odd number is odd and the square of an even number is even, the case (1) is not possible because a 2 would be odd and 2 b 2 would be even. If case (2) is also impossible, because then we could write a = 2c, where c is any integer, and so if we take the square then we have a 2 = 4c 2 that is to say an even number on both sides of equality. Substituting in 2 b 2 = a 2 we obtain after simplification that b 2 = 2c 2 . Then b 2 would be odd while 2c 2 would even. The case (3) is also impossible because a 2 is then odd and 2 b 2 is even (that b is even or odd!). There is the no solutions! That is to say that the start assumption is false and there does not two integers a and b such that \/2 = a/b. info @ sciences. ch 113/5785 4. Arithmetic EAME v3. 5-2013 E2. Let us prove by contradiction, that the famous Euler number e is irrational. To do this, remember that e (see section Functional Analysis) can also be defined by the Taylor series (see section Sequences and Series): e = 1 + i + i + i + + i + (2.40) Then if e is rational, it could be written in the form p/q (with q > 1, because we know that is not an integer). Let us multiply both sides of the equality by q\: i g! g! g! q!e — q! + — + — + — + g! g! + g! + (g + 1)! + (9 + 2 )! + . . . (2.41) The first member g!e would then be an integer, because by definition of the factorial: g! = g ■ (g- 1) • (g - 2) . . .2 • 1 (2.42) is an integer. The first terms of the second member of the previous prior-previous relation, until the term q\/q\ = 1 are also integer because g!/m! is simplified if g > m. So by subtraction we find: g!e — , , T- . q'- , T- . . 9! + H + 2! + 3! + '" + g ! g! g! (g + 1)! + (g + 2)! + "' (2.43) when the right sequences should be an integer! After simplification, the second member of the equality becomes: 1 1 g + 1 + (g + l)(g + 2) + “' (2.44) the first term in this sum is strictly less than 1/2, the second strictly less than 1/4 second, the third strictly less than 1/8, etc. So, since each term is strictly less than the following harmonic series which converges to 1: 1 2 + = 1 (2.45) then therefore the sequence is not an integer as being strictly less than 1. This is a con- tradiction! Thus, the rational numbers do not satisfy the numerical expression of a/2 and e (to cite only these two particular examples). They must therefore be complemented by the set of all numbers that can not be written as a fraction (the ratio of an integer dividend and an integer divisor without common factors) and that we name "irrational numbers". Finally we can say that: 114/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Definition (#23): In mathematics, an "irrational number" is any real number that cannot be expressed as a ratio of integers. Irrational numbers cannot be represented as terminating or repeating decimals. 2.2.5 Real Numbers Definition (#24): The union of rational and irrational numbers gives the set of "real numbers" that we denote by: QcR (2.46) Mathematicians in their usual rigour have different techniques to define real numbers. They use the properties of topology (among others) and especially Cauchy sequences but that’s another story that goes beyond the formal scope of this section. For a "set point of view definition of R the reader should report to the section on Set Theory. Figure 4.10 - Simple number sets summary Obviously we are led to ask ourselves whether R is countable or not. The proof is quite simple. Proof 4.3.4. By definition, we have seen above that there must be a bijective correspondence between Q and R to that R is countable. For simplicity, we will show that the interval [0, 1 [ is then not countable. This will involve of course by extension that R is not countable! The elements of this interval are represented by infinite sequences between 0 and 9 (in the decimal system): • Some of these suites are zero from starting from a given rank, some not. • So we can identify [0, 1[ to the set of all sequences (finite or infinite) of integers between 0 and 9. info @ sciences. ch 115/5785 4. Arithmetic EAME v3. 5-2013 If this set was countable, we could classify these sequences (with a first, second, etc.). Thus, the sequence xuXi 2 XisXu...xi p ... would be classified first and so on ... as proposed in the above table. We could then edit this infinite matrix as follows: to each element of the diagonal, we add 1, according to the rule: 0 + 1 = 1,1 + 1 = 2, 8 + 1 = 9 and 9 + 1 = 0: Then let us consider the sequence on the diagonal: - It cannot be equal to the first sequence of the first row in the prior-previous table since it is distinguished at least by the first element. - It cannot be equal to the second sequence of the second row of the prior-previous table since is distinguished at least by the second element. - It cannot be equal to the third sequence of the second row of the prior-previous table since is distinguished at least by the third element. and so on ... It the cannot be equal to any of the sequences in this table! So whatever the chosen classification of infinite sequences of 0...9, there is always one who escapes this classification! So it is that it is impossible to number them ... simply because they do not form a countable set! □ Q.E.D. 116/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic The technique that has allowed us to achieve this result is known as the "Cantor diagonal pro- cess" (because similar to that used for equipotence between the natural and rational set) and the set of real numbers is said to have the "power of continuum" by the fact that it is uncountable. Remark We assume that it is intuitive for the reader intuitive that any real number can be approx- imated infinitely close by a rational number (for irrational numbers we simply stop at a given number of decimals and find the corresponding rational). Mathematicians say therefore that Q is "dense" in M. and denote this by: Q = M (2.47) \ / In business it is of usage with real numbers to communicate in percentages or per-thousand. Definitions (#25): D1 . Given a scalar x e M then expressed in percentage it will denoted by: x% = x- 100 (2.48) D2. Given a scalar i£l then expressed in per-thousand it will denoted by: x%c = x ■ 1, 000 (2.49) 2.2.6 Transfinite Numbers We now are with an infinity of real numbers which is different from that of natural numbers. Cantor then dared what no one had dared since Aristotle: the positive integers sequence is also infinite, the set N, is then a set that has a countable infinity of elements, then he said that the cardinal (see section Set Theory) of this set was a number that existed as such without we use the tote symbol oo, he denote it: K 0 = Card(N) (2.50) This symbol is as we know (see section Set Theory) the first letter of the Hebrew alphabet, pronounced "aleph zero". Cantor was going to name this strange number, a "transfinite number". The decisive act is to assert that there is, after the finite, a transfinite, that is to say an unlimited scale of determined modes which by nature are infinite, and yet can be specified, as for the finite, by specific numbers, well defined and distinguishable from each other!! This tool was necessary as a set cardinal can be equal to one of its parts as we will see just below! After this first stroke going against most ideas for over two thousand years, Cantor would con- tinue its path and build the calculation rules, paradoxical at first glance, of the transfinite num- bers. These rules were based, as we said earlier, on the fact that two infinite sets are equivalent if there exists a bijection between the two sets. info @ sciences. ch 117/5785 4. Arithmetic EAME v3. 5-2013 Thus, we can easily show that the infinity of even numbers is equivalent to the infinity of inte- gers: for this, it suffices to show that for every integer, we can associate an even number, his double, and vice versa. Therefore the cardinal of integers is equal to those of even numbers (the cardinal of a set can be equal to one of its parts!). Thus, although if even numbers are included in the set of integers, there is an infinity a 0 of them, the two sets are equipotent. By stating that a set can be equal to one of its parts, Cantor goes against what seemed obvious to Aristotle and Euclid: the set of all sets is infinite! This will shake the whole of mathematics and will bring the axiomatic Zermelo-Fraenkel we will see in the section of Set Theory. From the above, Cantor define the following calculations rules on the Cardinals: N 0 + 1 = N 0 N 0 + N 0 = No N 2 = No / (2.51) At first glance these rules seem non-intuitive, but in fact they are! Indeed, Cantor defined the addition of two transfinite numbers as cardinal of the disjoint union of the corresponding sets. ^Examples: El. By noting K 0 the cardinal of N we have K 0 + N 0 which is equivalent to say- ing that we summ the cardinal of N disjoint union N. But as N disjoint union N is equipotent to N then K 0 + N 0 = N 0 (it is enough to be convinced to take the set of odd and even integers which are both countable and which disjoint union is also countable). E2. Other trivial example: K 0 + 1 corresponds to the cardinality of N union a point. This set is still equipotent to N therefore N 0 + 1 = N 0 . We will also during our study of the section Set Theory that the concept of Cartesian product of two countable sets is such that we have: Card(N x N) = Card(N 2 ) = [Card(N)] 2 (2.52) and therefore: No = No (2.53) Similarly (see section Set Theory) since Z = Z + U Z“ we have: N 0 + N 0 = N 0 (2.54) and identifying Q to Z x Z (ratio of a numerator over denominator) we have immediately: N 0 x K 0 = (N 0 ) 2 = K 0 (2.55) We can also prove an interesting statement: if we consider the cardinality of the set of all the cardinals, it is necessarily greater than all the cardinals, including itself (it is better to have read previously the section of Set Theory)! In other words: the cardinality of the set of all sets of A is greater than the cardinal of A itself. 118/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic This implies that there is no set containing all sets since there is always a bigger one (it is an equivalent form of the famous old Cantor’s paradox)!! ! In technical language it means considering a non-empty set A and then to state that: Card(A) < Card (V(A)) (2.56) where V(A) is the set of subsets of A (see the section Set Theory for the general calculation of the cardinal of the set of all parts of a countable set). That is to say, by definition of the order relation < (strictly less), it suffices to prove that there is no surjective application f : A (->• V(A), in other words that to each element of the set of parts of A it does not match at least one pre-image in A. Remark The set V(N) for example consists of the set of even numbers, odd numbers, natural numbers, as well as the empty set itself, etc. V(N) is therefore the set of all "potatoes" (to borrow the vocabulary of high school ...) that make N. V I / Proof 4.3.5. Suppose that we can number each potatoe of V(A) with at least one element of A (imagine that with N or see the example in the section of Set Theory). In other words it is equivalent to suppose that f : A i— >• V(A) is surjective and let us consider a subset E of A such that: E = {x e A\x £ f(x)} (2.57) that is to say the set of elements x oi A that do not belong to the set numbered by x (the element x does not belong to the "potato" that it numbers in other terms...). Or, if / is surjective it must also be a y e A for this subset E such that: f(y) = E = {x e A\x i f{x)} (2.58) since E is also a subset of A. Suppose that y belongs to E. In this case, by definition of E, y ^ f(y) = E (by definition of E that applies for every x and x can also be obviously y or z or don’t matter what). By consequence, y / nE , but in this second case, always by definition of E, y e f(y) — E (as y is not in E). We see therefore that the element y cannot exists and therefore / cannot be surjective. We strongly recommend the reader to read the previous sentence more than on time if necessary. □ Q.E.D. info @ sciences. ch 119/5785 4. Arithmetic EAME v3. 5-2013 2.2.7 Complex Numbers Invented in the 16th century among others by Girolamo Cardano and Rafaello Bombelli, "com- plex numbers" (also named "imaginary numbers") are used to solve problems with no solutions in M. and also used to mathematically formali z e certain transformations in the plan such as ro- tation, similarity, translation, etc. and also to generalized some theorem restricted to M and therefore hiding some interesting results for practical engineering. For physicists, complex numbers above are also a very convenient way to simplify notations. It is thus very difficult to study wave phenomena, General Relativity or quantum mechanics without using complex numbers and expressions. There are several ways to construct complex numbers. The first is typical of the construction way that mathematicians used as part of Set Theory. They define a couple of real numbers and define the operations between these couples to finally arrive at a meaning of the complex number concept. The second one is less rigorous but its approach is simpler and consist to define the pure unit imaginary number i and then build arithmetic operations from its definition. We will opt for the second method in the texts that will follow ! Definitions (#26): 1 . We define the "unit pure imaginary number" that we denote by i by the following property: (2.59) ■2 = —1 -vv- i = \J — 1 2. A "complex number" is a pair of a real number a and an imaginary number i b and gener- ally written in the following form: z = a + ib (2.60) where a and b are numbers belonging to M. 3. We note the set of complex numbers by C and therefore we have by construction: KcC (2.61) Remark The set C is identified to the oriented Euclidean plane E (see section Vector Calculus) thanks to the choice of a direct orthonormal basis (we therefore get "Argand-Cauchy plane", also named "Gauss-Argand plane" or more commonly "Gauss plane" that we will see a little further below and that seems have be defined for the first time in 1806). V / The set of complex numbers that constitutes a field (see section Set Theory) and denoted by C, is defined (in a simple way to start) in the notation of set theory by: C = {z = (x + i y) \x, i/6K} (2.62) 120/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic In other words we say that the field C is the field M. to which we have added the imaginary number i. Which is formally denoted by: M [i] (2.63) The addition and multiplication of complex numbers are internal operations to the set (field) of complex numbers (we will come back much more in detail on certain properties of complex numbers in the section of Set Theory) and defined by: zi + z 2 = {xi + x 2 ) + i(yi + y 2 ) z i ■ z 2 = {xrx 2 - 2 / 12 / 2 ) + i(xiy 2 + x 2 y ] ) The "real part" of z is traditionally denoted by: 9ft(z) = x The "imaginary part" of z is traditionally denoted by: ^(z) = x The "conjugate" of z is defined by: z = x — i y and is sometimes also denoted z* (particularly in quantum physics in some books!). (2.64) (2.65) (2.66) (2.67) From a complex and its conjugate, it is possible to find its real and imaginary parts. These are the following obvious relations: , -2 + 2 . -A z^jz 9R(z) = — - — and S(z) = (2.68) 2 v ' 2i The "module" of z (or "norm") is the length from the center of the Gaussian plane (see further below a figure of the Gaussian plane) and is simply calculated using the Pythagorean theorem : \z\ = \/ x 2 + y 2 = V ' z ■ z (2.69) and is always a positive number or or equal to zero. We consider as obvious that is satisfy all the properties of a distance (see section of Topology and Vector Calculus). r Remark ^ The notation of z when z i: V \z\ for the module is not innocent since 3 real. \z\ coincides with the absolute value J The division between two complex number s is calculated as (the denominator is obviously not zero): zi _ xi + m _ xi + iyi _ x 2 - i y 2 _ (x x x 2 + 2 /U/ 2 ) - ifoiZfe ~ x 2 yi) z 2 x 2 + iy 2 x 2 + i y 2 x 2 - i y 2 x\ + y\ The opposite of a complex number is calculated similarly: 1 x — i y x — i y x y x + i y {x + iy) (x — iy) x 2 + y 2 x 2 + y 2 x 2 + y 2 We can therefore list 8 important properties of the module and the complex conjugate: (2.70) (2.71) info @ sciences. ch 121/5785 4. Arithmetic EAME v3. 5-2013 PI. We affirm that: \z\ = 0 2 = 0 (2.72) Proof 4.3.6. By definition of the module \z\ = a/x 2 + y 2 so that the sum x 2 + y 2 is zero, the necessary condition is that as (x, y ) G M: x = y = 0 (2.73) □ Q.E.D. P2. We affirm that: \z\ = \- z \ = \z\ (2.74) Proof 4.3.7. This is immediate by: |z| = \jx 2 + y 2 = \/(-x ) 2 + ( -y ) 2 = \- z\ = y/{x ) 2 + ( -y ) 2 = \z\ (2.75) □ Q.E.D. P3. We affirm that: |9?(z) | ^ \z\ with equality iif z is real |9 : (^)| ^ \z\ with equality iif z is imaginary Proof 4.3.8. The two above inequalities can be written: \x\ < \J x 2 T y 2 \y\ < \Jx 2 + y 2 thus equivalent respectively to: x 2 < \J x 2 T y 2 y 2 < \Jx 2 + y 2 which are trivial. The rest of the proof is therefore trivial! (2.76) (2.77) (2.78) □ Q.E.D. P4. We have: and if z 2 ^ 0: V^i,2 2 G C x C \z\Zi\ = \zi | \z 2 Zl = \Zl Z 2 \z 2 (2.79) (2.80) 122/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Proof 4.3.9. First: I-1-2I 2 = (ziz 2 )(zizj) = (z 1 z 2 ) (ZiZ 2 ) = (z 1 z 1 )(z 2 z 2 ) = |-i| 2 |- 2| 2 => \ZlZ 2 \ = l^lll^l (2.81) (we will prove a little further below that generally Z\Z 2 = Z\Z 2 ) and for z 2 7 ^ 0 : Zi 2 Zl Z 1 z 1 1 Zl 1 Z1Z1 \zi 1 z 2 ^2 Z 2 = —^1 — Z2 Z 2 = —z 1 — Z2 Z 2 Z2Z2 |^2 | and taking root square this finish the proof. □ Q.E.D. P5. We affirm that: |z | 2 = zz (2.83) Proof 4.3.10. This is immediate: |z | 2 = (\/x 2 + V 2 ) 2 = (x + i y)(x — i y) = x 2 — i xy + iyx + y 2 = x 2 + y 2 = zz (2.84) □ Q.E.D. P 6 . We affirm that: Vz,z' GC z = z, z + z' = z + z', zz' = zz' (2.85) Proof 4.3.11. The first one is immediate: z = x + iy = x — iy = x + iy = z (2.86) and: z + z' = (x 1 + iy 1) + (x 2 + i y 2 ) = {x 1 + x 2 ) + i(j/i + y 2 ) = (xi + x 2 ) - \{yi + y 2 ) = (xi - iyi) + (x 2 - i y 2 ) = z + z' (2.87) and: zz' = (xi + iyi)(x 2 + i y 2 ) = (xix 2 - 2/12/2) + i(xiy 2 + y 2 x 2 ) = (xiX 2 - 7/17/2) + i{xiy 2 + 7/2X2) = (xi - iyi)(x 2 - i y 2 ) = zz' (2.88) □ Q.E.D. Remark Rl. In mathematical terms, the first proof helps to show that complex conjugation is what is named an "involution" (in the sense that it is changing anything ...). R2. Also in mathematical terms (it is only the vocabulary!), the second proof shows that the combination of the sum of two complex numbers is what we name a "group automorphism (C, +)" (see section Set Theory). R3. Again, for vocabulary ... the third proof show that the combination of the prod- uct of two complex numbers is what we name a "field automorphism (C, +, x)" (see section Set Theory). V / info @ sciences. ch 123/5785 4. Arithmetic EAME v3. 5-2013 P7. We affirm that for 0 different from zero: (2.89) Proof 4.3.12. We will restrict ourselves to the proof of the second relation that is a general case of the first (for z — 1). / xi + nji A _ / xi + iyi A / x 2 - i?y 2 \ _ [xix 2 + 2/12/2) + 2(2/1 £2 ~ 2/2^1) \^2 + i?/2/ V^2 + *2/2/ V^2 -ij/2/ A + Vl x\x 2 + j/rz /2 _ . y\X 2 - V 2%1 _ {xix 2 + yiy 2 ) + i ( 2 / 2^1 ~ 2 / 1 ^ 2 ) ^2 + 2/1 1 xl + yl x\ + yl {x\ ~ iyi)(a ?2 + iy 2 ) _ x x - i;j/i x 2 + 12/2 _ z_ x\ + y\ x 2 - iy 2 x 2 + i y 2 z' (2.90) □ Q.E.D. P8. We have: \zi + Z 2 \ ^ |^1 1 + |^2 I (2.91) for any complex number z\,z 2 (strictly speaking non-zero complex numbers, otherwise the concept of argument of the complex number that we will see further below is undeter- mined). Furthermore the equality holds if and only if z\ and z 2 are collinear (the vectors are "on the same straight line") and of the same direction, in other words.... if it exist A G K. such as Xzi = z 2 . Proof 4.3.13. Directly we have: \zi + z 2 j 2 = \zi\ 2 + 2$l(ziz 2 ) + \z 2 \ 2 ^ \zi\ 2 + 2 1 01 ^ 2 1 + \z 2 1 2 = (|^i| + | - 2 ^ 2 1 ) 2 (2.92) This inequality may not be obvious to everyone, therefore let us develop it a bit and let us assume it true: \zi + z 2 \ 2 ^(|0i| + | ^2 1 ) 2 y/(x 1 + x 2 ) 2 + ( 2/1 +y 2 )^ < (>Jxl + 2/1 + ^x\ + 2 / 2 )^ {Xi + X 2 ) 2 + (y 1 + y 2 ) 2 <(^1 + y\) + 2 \j x\ + y\\j x\ + y\ + (x 2 2 + y\) x\ + 2x1X2 + xj + y\ + 22/12/2 + y\ <(x? + y\) + 2 ^x? + y\yjx\ + y\ + (x^ + y|) (2.93) After simplification: 2T^2 + 2/12/2 < V^i +2/1^2 + 2/2 (X 1 X 2 + 2 / 12 / 2) 2 + yl){xl + 2 / 2 ) (2 - 94) ? 22,0 1 2 2 > 2 2, 22, 22, 22 XiX 2 + 2x 1 x 2 y 1 y 2 + y x y 2 ^ x x x 2 + x x y 2 + y x x 2 + 2/i2/ a 124/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic and again after simplification: ? Zx^ym < + Z/1^2 0 ^ x\y\ - 2x 1 x 2 y 1 y 2 + y\x\ ( 2 - 95 ) ? 0 <(afij / 2 -t/1^2) 2 So as the square brackets is necessarily positive or zero it follows: 0 < [x\y 2 - yix 2 f (2.96) This last relation thus shows that inequality is true. □ Q.E.D. Remark In fact there is a more general form of this inequality named "Minkowski inequal- ity", proved in the section of Vector Calculus (complex numbers can indeed be written in the form of vectors as we will see later). V 2.2.1.X Geometric Interpretation of Complex Numbers We can also represent any complex number a + ib or a — ib in a plane defined by two axes (two dimensions) of infinite length and orthogonal between them. The vertical axis represents the imaginary part of a complex number and the horizontal axis the real part (see figure below). So there is correspondence between the set of complex numbers and the set of vectors of the Gaussian plane (notion of affix as we will see more deeply in the section of Vector Calculus). We sometimes named this type of representation "Gauss plane" or "Gauss map": and then we write: Aff(r) = a + ib (2.97) info @ sciences. ch 125/5785 4. Arithmetic EAME v3. 5-2013 We see on this diagram that a complex number has thus a vector interpretation (see section Vector Calculus) given by: z a + lb where the canonical basis is defined such as: with: r — \z\ — \J a 2 + b 2 (2.98) (2.99) (2.100) Thus, is the unitary basis vector of the carried by the horizontal unitary basis vector carried by the vertical imaginary axis M; and r is that is positive or zero. axis M and is the the module (the norm) This has to be compared with the vectors of M 2 (see section Vector Calculus): v = xei + ye 2 = x Q + y ^ = Q (2.101) with: ||E|| = sjx 2 + y 2 (2.102) so that we can identify the complex plane with the Euclidean plane. Thanks to the geometric interpretation of the Gaussian plane, the equality below is immediate for example and avoids making some developments: a + bi b + a (2.103) In addition, the definitions of the cosine and sine (see sectionTrigonometry) give us: Finally: Therefore: a = r cos((f) b = r sin(^) r = V a 2 + b 2 Lp 1 = cos 1 (2.104) (2.105) z = a + ib = r cos(^(+ir sin(^) = r(cos(</j) + i sin(^)) = rcis(yj) (2.106) complex number which is always equal to itself modulo 2 tt by the properties of trigonometric functions: z = r(cos(tp) + isin(^)) r(cos(</9 + 2 /c7t) + i sin(<£> + 2kn)) (2.107) 126/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic with k E N and where <p is named the "argument of z" and is traditionally denoted by: arg(z) (2.108) The properties of the cosine and sine (see section Trigonometry) lead us directly to write for the argument: arg(z) = — arg(z) and arg(— z) = arg(z + tt) (2.109) We also prove among other things with the Taylor series (see section Sequences and and Series) that: and: If 3 G9 5 , m 2k+1 sin( V ) = ¥3 -_ + _-... + (-1) + which sum is similar to: + 2 , + 3 e 1 ^ = 1 + 99+7— + 7— + ... + 7— + 2! 3! k\ but instead perfectly identical to the Taylor expansion of e lx : 2 3 k e ltp — 1 + \p— — 1 ^— + . . . + i k y- + ... = cos(<p) + isin(<£>) 2! 3! k\ So finally, we can write: z = r(cos(ip) + isin(<£>)) = re llf relation named "Euler’s formula". Using the properties of trigonometric functions: cos(99) + isin(<£>) = e lip cos(99) — isin(<£>) = e~ xif> (2.110) (2.111) ( 2 . 112 ) (2.113) (2.114) (2.115) Depending on we sum or subtract the this gives us the "Euler formulas" or "Moivre and Euler formulas": . . e iv + e~ lip cos(+) = — p-'vp sin (+) = 2i (2.116) Note that the angle can be a purely a complex number! This is to say that in all generality trigonometric functions can be considered as functions that go from C to C. info @ sciences. ch 127/5785 4. Arithmetic EAME v3. 5-2013 Thanks to the exponential form of a complex number, very commonly used in many fields of physics and engineering, we can easily draw relations such that starting from (remember that cis is an old notation that stands for the cos(<^) + i sin ( 93 ) being in the parenthesis): z = r(cos(</?) + isin(<£>)) = rcis(^) = re lip Zi = ri(cos(<pi) + isin(</?i)) = rycis^i) = rie 1<f>1 (2.117) 2:2 = r 2 (cos(</? 2 ) + isin(</? 2 )) = riris(</? 2 ) = r 2 e lip2 and assuming known the basic trigonometric identities (see section Trigonometry) we have the following relations for the multiplication of two complex numbers: Z\Z 2 = r x r 2 [cos(y?i + ip 2 ) +isin(^i + tp 2 )\ = r 1 e t¥ ’ 1 r 2 e * ¥ ’ 2 = rir 2 cis (^ 1 + ip 2 ) — r 1 r 2 e l( ' ipl+ip2) (2.118) therefore: arg(zi 2 : 2 ) = argfo) + &rg(z 2 ) and therefore if n is a positive integer: arg(z n ) = narg(z) For the module (norm) of the multiplication: \ziz 2 \ = \rie lipi r 2 e llp2 \ = \rir 2 e l ^ ipi+ip2} I = r\r 2 = |fi||r 2 | Therefore: \z m \ = \z\ m (2.119) (2.120) ( 2 . 121 ) (2.122) For the division of two complex numbers: — = — [cos(<pi - <p 2 ) + isin(y>i - if 2 )\ = — cis((^i - <p 2 ) = — = — e r 2 r 1 — < r 2 = = 7 l e K‘Pi-‘P2) (2.123) r 2 e llf>2 r 2 The module of their division then comes immediately: \zi 1 1*2 I (2.124) therefore we have for the argument: zi \ / rie arg \ — = arg , z 2 ) \r 2 e i 0 i ' 102 = arg ( ^ e <(vi_<pa) » = ^ = = </A ~ ^2 = arg(^i) - arg(^ 2 ) (2.125) and it comes immediately: arg^ -1 ) = arg (( re 1 ^) = arg \ = — <p = — arg(^) For the power of a complex number (or root): z m = r m e im v = r m [cos(m^) + isin(^)] = r m cis(m(p ) (2.126) (2.127) 128/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic which gives us immediate a already proved previously: (2.128) and for the argument: arg(^ m ) = arg ((re ltp ) m ) = arg (r m e m¥ ) = rrup = rnarg(z) (2.129) In case we have a unit module (norm equal to 1) as z — cos(<p) + i sin(</?) we then have the relation: (cos(fyj) + isin(<^)) m = (e 1¥ ) m = e imip = cos (rrup) + isin (rrup) (2.130) named "De Moivre formula". For the natural logarithm of a complex number, we trivially have the following relation which is discussed in the section of Analysis Analysis: ln(z) = In (re 11 * 3 ) = ln(r) + 'up (2.131) where ln(z) is often in the complex case written Logfzj with an uppercase "L". All previous relations could of course be obtained with the trigonometric form of complex numbers but then require some additional lines of mathematical developments. 2.2.7. 1.1 Fresnel Vectors (phasors) A sinusoidal variation f(t ) = rsin(cct) can be represented as the projection (see section Trigonometry) on the vertical y - axis (imaginary axis the set C) of a rotating vector r at angular velocity u around the origin in the plane xOy: Such a rotating vector is named "Fresnel vector" and can be well interpreted as the imaginary part of a complex number given by: r = 3 ?(r) + i $s(r) = r(cos(cct) + isin (cut)) = re 1UJt = re 1(t> (2.132) That is to say: info @ sciences. ch 129/5785 4. Arithmetic EAME v3. 5-2013 Figure 4.13 - Fresnel rotating vector We will see the phasor again explicitly in our study of wave mechanics and geometrical optics (as part of diffraction) in the sections with the corresponding names. 2.2.7. 2 Transformation in the plane It is customary to represent real numbers as points on a graduated line. The algebraic operations have their geometric interpretation on it: the addition is a translation, a multiplica- tion a centered scaling. In particular we can talk about the "square root of a transformation." A translation of amplitude T may be obtained as the iteration of a translation of amplitude Tj 2. Similarly, a scaling of amplitu S can be achieved as iterated scaling of faction a fS. In particular an homothety (scaling) of a factor 9 can be composed of two homotheties (scaling) of respectively 3 (or —3). Then we can say that the square root takes on a geometric sense. But what about the square root of negative numbers? In particular of the square root of —1??? A scaling of factor —1 can be seen as a symmetry with respect to the origin. But if we see this transformation in a continuous manner. Therefore a —1 scaling factor also be seen as rotation of 7r rotation around the origin. So, the problem of negative square root is simplified. Indeed, it is not difficult to break down a rotation of 7r radians inot two transformations: we can repeat either a rotation of 7r/2 or of — 7t/2. The image of 1 is the square root of —1 and i is situated on a perpendicular to the origin at a distance 1 either up or down. Having successfully positioned the number i it not difficult anymore to put other complex num- bers in the Gauss plane. We can therefore associate to 2i the product of the scaling of a factor 2 (see section Euclidean Geometry) by the rotation of center O with angle of 7t/2, that is to say a similitude centered at the origin. This is what we will endeavor to prove now. Given: = X\ + i yi = ae ia , z 2 = x 2 + iy 2 = be 113 (2.133) We have the following geometric transformations properties for complex numbers (see the sec- tion Trigonometry for the properties of sine and cosine) that we can happily combine at our 130/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic discretion: PI. The multiplication of z \ by a real number A in the Gauss plane corresponds trivially to a homothety of center O (the intersection of real and imaginary axis for recall...) and of ration A. Indeed: Xzj = (A a)e ia (2.134) P2. Multiplying of z\ by a complex number of unit module corresponds a rotation of center O and of angle corresponding to the argument of z\. Indeed: zqZ\ = e iuj ae ia = ae^ a+U}) (2.135) Remark Then we see immediately, for example, that multiplying a complex number by i (that is to say a complex number with sin (a;) = 1, cos(cc) = 0) corresponds a rotation of 7t/2 Theorem 4.4. It is interesting to notice that in vector form the rotation of center O of z\ by z 0 can be written using the following matrix: -y o Vo x o (2.136) Proof 4.4.1. We have just seen before that z 0 zi is a rotation of center O of and angle c o. We just need to write it first in the old style: z 0 Zi = (x 0 + iy 0 )(xi + iyi) = (x 0 xi - y 0 yi) + i(x 0 yi + VoXi) giving in vector form: ^ XqX\ - t/ol/iA . ( 0 Z 0 Zi = 0 + 1 XoVi + yoX\ (2.137) (2.138) thus the linear equivalent application is: x o -y o yo x o XoXi - yo x i Xoyi + 2/02/1 (2.139) or as well (we fall back on the rotation matrix in the plane we that we will see in the section of Euclidean Geometry which is a remarkable result!) using: z = r(cos(tp) + isin(</?)) = r(cos(cp + /c27t) + isin(</? + k 2tt) (2.140) and in the particular and arbitrary case where r is unitary (in order to have a pure rota- tion!): 0 = (cos(<y 2 + k2n) + isin(<£> + k27r) = x 0 + i y 0 (2.141) info @ sciences. ch 131/5785 4. Arithmetic EAME v3. 5-2013 we have immediately (we took again the same notations for the angle as the one we we have in the chapter Geometry): Xq 2/o _ coscc — sin u .2/0 Xq V2/J since cost 0 / x 0 x i - yol/i \ V a '02/i + 2/o^i / / cos(w)xi — sin(ce)?/i ycos(tc)2/i + sin(ce)a;i (2.142) Note that the rotation matrix can also be written as: cos(ce) — sin(u;) sin(ce) cos(ce) as well: cos(ce) 1 0 0 1 + sin(ce) 0 1 cos(ce)/ + sin(ce) J (2.143) -y o . 2/0 x 0 Xq 1 0 0 1 xo ■ I + 2/o • J (2.144) □ Q.E.D. Thus we see that the rotation matrices are not only applications but also are complex numbers (well it was obvious from the start but we had to show it in an aesthetic and simple way). So, we have for usage to put that: 1 = O O, and i = 0 -1" 1 0 J or with another common notation in linear algebra: 1 = 1 O' 0 1 and i = 0 -1 1 0 (2.145) (2.146) The field of complex numbers is isomorphic to the field of real square matrices of dimen- sion 2 of the type: x 0 -2/o ,2/o x 0 (2.147) It is a result that we use many times in various section of this book for specific studies in algebra, geometry and relativistic quantum physics. P3. The multiplication of two complex corresponds to a homothety added to a rotation. In other words, a "direct similarity". Proof 4.4.2. Z\Z 2 = ae ia be ip = (a6)e i( ^ } (2.148) so this is indeed a similarity of ratio b and angle f3. At the opposite, the following operation: Z\Z 2 = ae ia be~ if} = (i ab )e i{ ^ (2.149) 132/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic will be named a "retrograde linear similarity". Otherwise, it returns trivially an already known following relation: arg^a) = argfo) + arg (z 2 ) (2.150) Remarks Rl. As the sum of two complex numbers z\ + Z 2 can not have a special simplified mathematical notation in any form whatsoever, then we say that the resulting quantity is equivalent to an "amplitude translation". R2. The combination of a direct linear similarity (multiplication of two complex numbers) and an amplitude translation (sum by a third complex number) is what we name a "direct linear similarity". V □ Q.E.D. P4. The conjugate of a complex number is geometrically symmetrical with respect to the axis such that: Z\ — X\ — ivy i = r(cos(a) — isin(ce)) = r(cos(— a) — isin(— a)) = re~ ia (2.151) without forgetting that (basis of trigonometry): cos(<^) = cos((y> + k27i) sin(<^) = sin(<£> + k2ir) (2.152) This gives us a known result: arg(fi) = —arg ( 01 ) (2.153) From which we get the following property: r(cos(<p + 7r) + isin(<£ + 7r)) = r(— cos ip — i sin tp) = — r( cosy? + isin</?) = —z\ (2.154) Hence: arg^x + 7T = — arg(— z\) (2.155) P5. The negation of the conjugate of a complex number is geometrically its symmetrical with respect to the imaginary axis such that: —Z\ = —x\ + iy/i = r(— cos a + i sin a) = r(cos(7r ± a) + i since) (2.156) Remarks Rl. The combination of the properties P4, P5 is named a "retrograde similarity". R2. The geometric operation that consist to take the inverse of the conjugate of a complex number (that is to say z _1 ) is named a "pole inversion". V inf o@ sciences. ch 133/5785 4. Arithmetic EAME v3. 5-2013 P6. The rotation of coordinate cente c and angle (f is given and denoted by: K(Z!) = c + e i *{z 1 -c) (2.157) Some explanations could be useful for some readers: The complex c gives a point in the Gaussian plane, which will be the center of rotation. The difference z\ — c gives the chosen radius r. The multiplication by e ltp is the coun- terclockwise rotation of the radius from the origin of the Gaussian plane. Finally, the addition by c is the necessary translation to take back the rotated radius r at its original place before the rotation (center c). Which gives schematically: Figure 4.14 - Representation of the complex rotation P7. On the same idea, we get and denote an homothety of center c and ratio A by: 'H(zi) — c + \{z\ — c) (2.158) Some explanations could be useful for some readers: The difference Z\ — c always gives the radius r and c a central point in the Gauss plane. The expression A(zi — c) gives the homothety of the radius from the origin of the Gaussian plane, and finally by adding c gives the necessary translation for the homothety to be see as being made from center c. 134/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 2.2.8 Quaternion Numbers Also named "hypercomplex" quaternions numbers were invented in 1843 by William Rowan Hamilton to generalize complex numbers. Definition (#27): A "quaternion" is an element (a, b, c, d) G M 4 and for which we denote by H the set that contains it and what we name the "set of quaternions". A quaternion can also be represented in a row or column such as: (a, 6, c, d) = / a' 1 b (2.159) \<i. We define the sum of two quaternions (a, b , c, d) and (a', b\ c', d!) by: (a, 5, c, d) 4- (o , b , c , d ) — (cz - E ci , - E , c - E c , cZ — E cZ ) (2.160) [ Remark It is the natural addition in M 4 seen as a M- vector space (see section Set Theory). ] The associativity is verified by applying the corresponding properties of the operations on M. We also define the multiplication: (a, 6, c, d) ■ (a 1 ,b ' , d , d') (2.161) of two quaternions (a, b , c, d) and (a', b' , d, d!) by the expression: (a, b , c, <T) • ( a , (/, c ; , </) = / aa' — bb' — cc' — ddl {ah' + ba') + {cd! — dc ') \ac' + ca ) — ( bd ' — db) \(da' + ad') + (bd — cb') / (2.162) It may be hard to accept but we will be a little further below that there is a family resemblance with the complex numbers. We can notice that the law of multiplication is not commutative. Indeed, taking the definition of the multiplication above, we have: ( 0 , 1 , 0 , 0 ) -( 0 , 0 , 1 , 0 ) = ( 0 , 0 , 0 , 1 ) ( 0 , 0 , 1 , 0 ) -( 0 , 1 , 0 , 0 ) = ( 0 , 0 , 0 ,- 1 ) But we can also notice that: ( 0 , 1 , 0 , 0 ) ■ ( 0 , 0 , 1 , 0 ) = - 1 ( 0 , 0 , 1 , 0 ) • ( 0 , 1 , 0 , 0 ) Remark (2.163) (2.164) is the natural addition in M 4 seen as a M-vector space (see section Set Theory). ] info @ sciences. ch 135/5785 4. Arithmetic EAME v3. 5-2013 The law of multiplication is distributive with the addition law but it is an excellent example where we must still be careful to prove the left and right distributivity, since the product is not commutative! The multiplication is neutral element: (1,0, 0,0) (2.165) Indeed: (1, 0, 0, 0) • (a, b, c, d) (a, b, c, d ) • (1, 0, 0, 0) (a, b, d, c ) Any element: (a, b,c,d ) G HI* = HI — {(0, 0, 0, 0)} is inversible. Indeed, if (a, 6, c, d) is a non-null quaternion, we then have necessarily: a 2 + b 2 + c 2 + d 2 7 ^ 0 (2.166) (2.167) (2.168) otherwise the four numbers a, b , c, d are of square null, so all zero. Given then the quaternion (ai, bi,Ci,di) defined by: a\ = bi = Cl = d, = a 2 + b 2 + c 2 + d 2 -b a 2 + b 2 + c 2 + d 2 —c a 2 + b 2 + c 2 + d 2 -d a 2 + b 2 + c 2 + d 2 (2.169) then by applying mechanically the definition of the multiplication of quaternions, we check that: (a, b, c, d) ■ (ai, b\, c±, d\) = (ai, b±, ci,d\) ■ (a, 6, c, d) = (1, 0, 0, 0) (2.170) this latter quaternion is therefore the inverse for the multiplication! Let us prove (for general knowledge) that the field of complex numbers (C, +, x) is a subfield of (HI, +, x). Remark We could also have put this proof in the section of Set Theory because we will make use of a lot of concepts that are have seen there but it seemed to us a little more relevant to put instead the proof here. We expect the reader to tolerate this choice. v ! w Given HI' set set of quaternions of the form (a, 6, 0, 0). If HI' is not empty, and if (a, 6, 0, 0), (a', b', 0.0) are elements HI' the (HI', + x) is a field. Indeed: 136/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic PI. For subtraction (and therefore the addition): (a, b, 0, 0) — (a', b', 0, 0) = (a — a', b — b' , 0, 0) G HI' (2.171) P2. The multiplication: (a, b, 0, 0) • (a', b ', 0, 0) = (aa' — bb', ab' + ba\ 0, 0) G HI' (2.172) P3. The neutral element: (1,0, 0,0 ) g e' P4. And finally the inverse: a 2 + b 2 b a 2 + b 2 g e' of (a, 6, 0, 0) is still in. (2.173) (2.174) Therefore (El', +, x) is a sub field of EL Given then the application: / : a + ib —¥ (a, 6, 0, 0) c ->• e / is bijective, and we easily check that for any complex zi, z 2 , we have: f( z 1 + z 2) = f( z l) + f( z 2) f( z 1-2) = f( z l)f( z 2) Therefore / is an isomorphism of (C, +, x) on (HI', +, x). (2.175) (2.176) This isomorphism has for interest (caused) to identify C to H' and to write CcH, the laws of addition and subtraction on El extending the already known operations of C. Thus, by convention, we will write any element of (a, 6, 0, 0) of El' in the complex form a + ib. Particularly 0 is the element (0, 0, 0, 0), 1 is the element (1, 0, 0, 0) and i and the element ( 0 , 1 , 0 , 0 ). We denote by analogy and by extension j the element (0, 0, 1, 0) and k the element (0, 0, 0, 1). The family {1, i, j, k} form a basis of all quaternions seen as a vector space on M, and we will write: a + bi + cj + dk (2.177) the quaternion (a, b, c, d ). The notation of quaternions as defined above is perfectly suited to the multiplication operation. For the product of two quaternions we get by developing the expression: (a + bi + cj + dk) ■ (a' + b'i + c'j + d'k) (2.178) info @ sciences. ch 137/5785 4. Arithmetic EAME v3. 5-2013 16 terms that we have to identify to the original definition of multiplication of quaternions to get the following relations: i ' j = k = -j ■ i i • k = i = — k • i . . . (2.179) k ■ l = j = — l • k •2 -2 1 2 1 i = ] = k = -1 Which can be summarized in a table: 1 i j k 1 1 i j k i i -1 k -j j j -k -1 i j j j — i -1 (2.180) We can see that the expression of the multiplication of two quaternions looks partly much like a vector product (denoted x in this book) and dot product (denoted o in this book): / aa' — bb' — cc' — ddl (ab' + ba') + ( cd! — dc!) ( ac ' + ca ') — ( bd ' — db') \{da 1 + ad') + {be' — cb' (2.181) If this is not evident (which would be quite understandable), let make a concrete example: ^Example: Given two quaternions without real part: p = x\ + y] + zk q = x'\ + y') + z'k (2.182) and u, v the vectors of M 3 of respective components (x, y, z ) and (x 1 , ?/, z'). Then the product: pq=(0,u)(0,v) (2.183) is equal to: p ■ q = {—xx 1 — yy l — zz\ yz 1 — zy ' , — xz! + zx\ xy' — yx!) = (—u ov ,u x v) (2.184) We can also for curiosity interest us to the general case ... Given for this two quaternions: p=(a,u ) q=(b,v ) (2.185) Then we have: p-q = (a + (0, u)) ■ {b + (0, v)) = ab+ (0, av) + (0, bu ) + {—u o v,u x v) (2.186) = ( ab — u o v, av + bu + u x v) 138/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Definition (#28): The center of the non-commutative field (H, +, x) is the set of elements of H commuting for to the law of multiplication with all the elements of H. Theorem 4.5. The center of (H, +, x) is the set of real numbers! Proof 4.5.1. Give Hi is the center of (H, +, x), and {x, y, z, t) a quaternion. We must have the following conditions are met: Given (x, y, z, t ) G Hi then for any (a, b, c, d) G H we seek: (x, y, z, t ) • (a, b, c, d) = (a, b, c, d) ■ (x, y, z, t ) which give by developing: xa — yb — zc — td — ax — by — cz — dt xb + ya + zd — tc = ay + bx + ct — dz xc : + za — yd + tb = az + cx — bt + dy ta + xd + yc — zb = dx + at + bz — cy after simplification (the first line of the previous system is equal to zero on both sides of equal- ity): ct — dz = 0 bt-dy = 0 (2.189) bz — cy = 0 the resolution of this system gives us: So that the quaternion (x, y, z, t) is the center of H it must be real (not imaginary parts)! □ Q.E.D. (2.187) (2.188) Just as for complex numbers, we can define a conjugate of quaternions: Definition (#29): The conjugate of a quaternion Z = (a, b, c, d) is the quaternion Z = (a, —b, —c, —d). Just as for the complex number, we notice that: 1 . First clearly that if Z = Z then it means that Z G M 2. That Z + ZgI 3. That by developping the product Zl we have: ZZ = (a, b, c, d) ■ (a, —6, — c, —d) = (a 2 + b 2 + c 2 + d 2 , —ab + ba — cd + dc, —ac + ca + bd — db , da — ad — be + cb ) = (a 2 + b 2 + c 2 + d 2 , 0, 0, 0) = a 2 + 6 2 + c 2 + d 2 G M (2.190) info @ sciences. ch 139/5785 4. Arithmetic EAME v3. 5-2013 that we will adopt, by analogy with complex numbers, as a definition of the norm (or module) of quaternions such as: \Z\ = Vz -Z (2.191) Therefore we also have immediately (relation which will be useful later): \ZZ'\ = /{ZZ^JI 7 (2.192) As for complex numbers (see below), it is easy to show that the conjugation is an automorphism of the group (HI, +). Z + Z' = {a + a', —b — b' , — c — c' , —d — d!) = (a, —b, — c, —d) + (a\ —b', — d, —d') (2.193) = Z + Z' It is also easy to prove that it is involutive. Indeed: Z = (a, -(-&), -(-c), -HO) = (a, b, c,d) — Z (2.194) But the conjugation is not a multiplicative automorphism of the field (HI, +, x). Indeed, if we consider the multiplication of Z, Z' and take the conjugate: ZZ' = aa' — bb' — cd — dd' {at/ + ba ') + {cd' — dd) {ad + ca) — {bd' — db) \{da' + ad') + {bd — cb , ) / ( ZZ' = aa' — bb' — cd — dd' -{ab' + ba') — {cd' — dd) -{ad + ca) + {bd' — db) -{da! + ad') — {bd — cb') / we see immediately (at least for the second row) that we have: (2.195) ZZ' ^ ZZ' (2.196) Let us now back to our norm (or module) .... For this, let us calculate the square of the norm \ZZ'\: We know (by definition) that: ZZ' -- Let us denote this product in such ZZ' I 2 = {ZZ') ■ {ZZ') / aa' — bb' — cd — dd' \ {ab' + ba') + {cd' — dc') \ {ad + ca') — {bd! — db') I \{da' + ad') + {bd — cb) J a way that: Z.Z' = {a,P, 7,A) = Z” Then we have: = a 2 + /3 2 + y 2 + A 2 (2.197) (2.198) (2.199) (2.200) 140/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic substituting it comes: {ZZ') • {ZZ') = {ao! - bb' - cd - dd') 2 + { ab ' + ba' + cd' - dc') 2 + {dc ccl — bd + db ) 2 -\- {dd -f- dd be — cb ) 2 after an elementary algebraic development (frankly boring) we find: (ZZ') ■ (ZZ 7 ) = (a' 2 + b 72 + c' 2 + d' 2 )(a 2 + b 2 + c 2 + d 2 ) = \Z’\ 2 \Z\ 2 Therefore: (ZZ') ■ (ZZ') = \Z\ 2 \Z’\ 2 = \ZZ’\ 2 Remark (2.201) (2.202) (2.203) The norm is therefore a homomorphism of (HI, x) in (M, x). Subsequently, we will denote by G all the quaternions of unit norm. 2.2.8. 1 Matrix Interpretation of Quaternions Given q and p two quaternions and given the application: p — >■ qp (2.204) The (left) multiplication can be made with a linear application (see section Linear Algebra) on H. If q is written: a + bi + cj + dk this application has for matrix in the basis l,i,j,k: (2.205) a —6 — c —d' b a —d c c d a —b d — c b a (2.206) What we check well: "a —b b a c d d —c ZZ' = / aa' — bb' — cc' — dd' (ab' + ba') + (cd' — dc') (ad + ca) — (bd' — db') \(da' + ad') + (bd — cb') / (2.207) In fact, we can then define the quaternions as the set of matrices with the visible structure above if we wanted to. This will then reduce them to a sub vector space of M 4 ( Especially, the matrix of 1 (the real part of the quaternion q) is then nothing other than the identity matrix: ilG = 10 0 0 0 10 0 0 0 10 0 0 0 1 = 1 (2.208) info @ sciences. ch 141/5785 4. Arithmetic EAME v3. 5-2013 as well: '0 -1 0 0 ■ '0 0 -1 O' '0 0 0 -r Mi = 1 0 0 0 0 , Mi = 0 0 0 1 0 A/fb — 0 0 -1 0 0 0 0 -1 1 0 0 7 K 0 1 0 0 0 1 0 . 0 -1 0 0 . .1 0 0 0 . (2.209) 2.2.8.2 Rotations with Quaternions We will see now that conjugation by an element of the group G of the quaternions of unit norm can be interpreted as a pure rotation in space! Definition (#30): The "conjugation" by a non-nul quaternion q of unit norm is the application S q defined on H by: S q : p — y q • p • q 1 = q ■ p ■ q V and we affirm that this application is a rotation. (2.210) Remarks Rl. As q is of unit norm 1, we have obviously |g| — qq — 1 therefore q 1 = q. This quaternion can be seen as the proper value (of unit norm) to the application (matrix) p on the vector q (we are in a similar situation as the orthogonal rotation matrices seen in the in section Linear Algebra). R2. S q is a linear application (so if it is rotation, the rotation can be decomposed into several rotations). Indeed, let consider two quaternions p\ , p 2 and two real number Ai , A 2 , then we have: Sq(XiPi + A 2P2) = g(AiPi + A 2 p 2 )q = Aigpig + \ 2 qp 2 q = XiS q (p 1 ) + A 2 S q (p 2 ) ( 2 . 211 ) V J Let us now check that the application is indeed a pure rotation. As we saw in our study of Linear Algebra and in particular of orthogonal matrices (see section Linear Algebra), a first obvious condition is that the application conserves the norm. Let us check this: \S q (p)\ = \qpq\ = \q\\p\\q\ = \p\ ( 2 . 212 ) Moreover, we can check that a rotation of a purely complex quaternion (such that then we restrict ourselves to M 3 ) and the same summed reverse rotation is zero (the vector sum up to its opposite cancel): S q (p) + S q (p ) = qpq + qpq = qpq + q(jpq) (2.213) we trivially check that if we have two quaternions q, p then p ■ q = qp since then: Sq{p) + Sqip) = qpq + q{pq) = qpq + (pq)q = q- p- q + q- p- q = q- p- q + q- p- q (2.214) = S q (p + p) 142/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic for this operation to be zero, we immediately see that we need to restrict ourselves to the purely complex quaternions p. Since then: S q (p + p) = S q (0) = 0 (2.215) We conclude then that p must be purely complex so the for the application S q is a rotation and that S q {p) is a pure quaternion. In other words, this application is stable (in other words: a pure quaternion by this application remains a pure quaternion). The application S q restricted to all purely complex quaternions is thus a vectorial isometry, that is to say a symmetry or a rotation. We have also seen during our study of the rotation matrices in the section of Linear Algebra and Euclidean Geometry that such matrices should have a determinant equal to 1 so that we have a rotation. Let’s see if this is the case of S q : For this, we explicitly calculate in function of: q = a + bi + cj + dk (2.216) the matrix (in the canoncial basis (■ i,j , kj) of S q and we calculate its determinant. Thus we obtain the coefficients of the columns of this application by remembering that: ij — k — -ji jk = i = —kj ki = j = —ik f = f = k 2 = -l and then by calculating: S q (i) = (a + bi + cj + dk)i(a — bi — cj — dk) = ( ai + b{i 2 ) + c(ji ) + d(ki))(a — bi — cj — dk) = (ai — b — ck + dj)(a — bi — cj — dk) = (a 2 i + ab — ack + adj) — ( ba — b 2 i — bcj — bdk) — ( cak — cbj + c 2 i + cd) + ( daj + dbk + cd — d 2 i) = (a 2 + 6 2 — c 2 — d 2 )i + {06 — ack + adj) — {kd — bcj — bdk) — ( cak — cbj + + ( daj + dbk + = ( a 2 + b 2 — c 2 — d 2 )i + 2 (ad + bc)j + 2 (bd — ac)k (2.217) (2.218) S q (j) = (a + bi + cj + dk)j(a — bi — cj — dk) = ( aj + b(ij) + c(j 2 ) + d(kj))(a - bi - cj - dk) = ( aj + bk — c — di)(a — bi — cj — dk) = ( a 2 j + abk + ac — adi) + ( bak — b 2 j + bci + bd) — (ca — cbi — c 2 j — cdk) — ( dai + db — dck + d 2 j) = ( a 2 — b 2 + c 2 — d 2 )j + (i abk + &C — adi) + ( bak + bci + M) — (pec — cbi — cdk) — ( dai + — dck) = 2 (be — ad)i + (a 2 — b 2 + c 2 — d 2 )j + 2 (ab + cd)k (2.219) info @ sciences. ch 143/5785 4. Arithmetic EAME v3. 5-2013 S q (k) = (a + bi + cj + dk)k(a — bi — cj — dk ) = ( ak + b(ik ) + c(jk) + d)(k 2 ))(a — bi — cj — dk) = ( ak — bj + ci — d)(a — bi — cj — dk) = ( a 2 k — abj + aci + ad) — ( abj + b 2 k + be — bdi) + ( cai + cb — c 2 k + cdj) — (da — dbi — dej — d 2 k) = ( a 2 — b 2 — c 2 + d 2 )k + (—abj + aci + &d) — (abj + M — bdi) + (cai + ^ + cdj) — (fid — dbi — dej) = 2 (ac + bd)i + 2 (cd — ab)j + (a 2 — b 2 — c 2 + d 2 )k We must then calculate the determinant of the following matrix (pfff ...): a 2 + b 2 — c 2 — d 2 2 (ad + be) 2 (bd — ac) 2 (be — ad) a 2 — b 2 + c 2 — d 2 2 (ab + cd) 2 (ac + bd) 2(cd — ab) (a 2 — b 2 — c 2 + d 2 ) ( 2 . 220 ) (2.221) remembering that (which also simplifies the expression of the terms of the diagonal as we can see in some books): a 2 + b 2 + c 2 + d 2 = 1 (2.222) we find that the determinant is indeed equal to 1. Otherwise, we can check this with Maple 4.00b: >with(linalg) : >A:=linalg [matrix] (3,3, [a~2+b~2-c~2-d~2,2*(a*d+b*c) , 2 * (b*d-a*c) ,2*(b*c-a*d) ,a~2-b~2+c~2-d~2,2*(a*b+c*d) , 2*(a*c+b*d) ,2*(c*d-a*b) , a~2-b~2-c~2+d~2] ) ; >factor(det (A)) ; Let us now show that this rotation is a half axis turn (the example that may seem particular is in fact general!): First, if: we have: q = xi + yj + zk (2.223) Sq(q) = qqq = q (2.224) which means that the axis of rotation (x, y. z) is fixed by the application S q itself! On the other hand, we have seen that if q is a purely complex quaternion of norm 1 then: q~ l = q and q = —q (2.225) Which gives us the relation: q 2 = q- (~q) = q ■ (~(q x )) = -i (2.226) 144/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic This result leads us to calculate the rotation of a rotation: Sq{S q {p)) = q(qpq)q = q 2 pq 2 = S q i{p) = (-l)p(-g) 2 = -pq 2 = (-l)p(-l) = p (2.227) Conclusion: Since the rotation of a rotation is a full turn, then S q is necessarily a half-turn: S q (p) = -p (2.228) relatively (!) to the axis (x, y, z ). At this stage, we can say that any rotation of the space can be represented by S q (the conjugation by a quaternion q of norm 1). Indeed, the half turns generates the group of rotations, that is to say that any rotation can be expressed as the product of a finite number of half-turns, and therefore by conjugation of a product of quaternions unitary norm (product which is itself a quaternion of unitary norm...). We will still give an explicit form connecting a rotation and the quaternion that represents it, just as we did for complex numbers. Theorem 4.6. Given u(x, y, z ) a unit vector and 9 e [0, 27 t] angle. The we affirm that the rotation of axis u and angle 9 corresponds to the application S q , where q is the quaternion: (2.229) For this assertion is verified, we know we need that: • The norm of q is equal to 1 • The determinant of the application S q is equal to 1 • The application S q conserves the norm • The application S q returns all collinear vector to the axis of rotation on the axis of rotation itself. Proof 4.6.1. Ok let us check every point: 1. The norm of the quaternion previously proposed is indeed equal to 1: (2.230) info @ sciences. ch 145/5785 4. Arithmetic EAME v3. 5-2013 and as u(x, y, z ) is of unit norm, we have: 2,2,2 -i x + y + z = 1 (2.231) Therefore: \q\ = cos 2 (0 + sin 2 (0 (x 2 + y 2 + z 2 ) = cos 2 (0 + sin 2 (0 = 1 (2.232) 2. The fact that q is a quaternion of unit norm immediately leads to the fact that the deter- minant of the application S q is also equal to 1. We have already proved it above in the general case of any quaternion of norm 1 (necessary and sufficient condition). 3. It is the same for the conservation of the norm. We have already proved earlier above that this was the case anyway when the quaternion q of norm 1 (necessary and sufficient condition). 4. Let us now prove that all collinear vector to the axis of rotation is projected onto the axis of rotation itself. Let us denote by q' the purely imaginary unitary quaternion xi + yj + zk. Then we have: Then: q = cos I - ) + sm I - ) q Sg(q') = qq'q (2.233) (2.234) but as q' is the restriction of q to the pure elements that constitute it, this is equivalent as to write: S q W) = S q (q) = qqq = q (2.235) Let us now show why we choose the writing 0/2. If v = (aq, yi, zi) denotes a unit vector orthogonal to u (therefore perpendicular to the axis of rotation), and p the quaternion xi + yj + zk then we have: Sq(p) = (cos (0 + sin (0 0 p (cos (0 - sin (0 0 cos 2 (0 p + cos (0 sin (0 {qp - pq) - sin 2 (0 qpq We have shown during the definition of multiplication of two quaternions that: (2.236) pq — -qp (2.237) o\ , . 2 (Q\ / / q p — sm I - ) q pq therefore we get: S q (p) = cos 2 (0 p + 2 cos (0 sii = cos 2 (0 p + 2 cos (0 sin (0 qp + sin 2 (0 qp(-q) = cos 2 (0 P + 2 cos (0 sin (0 qp + sin 2 (0 qpq' (2.238) 146/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic We have also prove earlier above that: S q {p) = ~p = qpq Therefore: qpq' = s q ’(p) = -p (the half turn of axis (x, y, z)). So: 0 , ,, 2 (e\ o (o\ . (0\ , . 2 S q > (p ) = cos I - I p + 2 eos I - I sm I - I q p - sin (2.239) (2.240) = | cos 2 ( 0 - sin 2 P + 2 cos Q ] sm P (2.241) = cos(9)p + sin [9)qp □ Q.E.D. We know that p is the pure quaternion likened to a unit vector v orthogonal to the axis of rotation u itself equated withthe purely imaginary part of <{ . We notice then immediately that the imaginary part of the product (defined!) of the quaternion q'p is equal to the cross product u x v = w. This vector product therefore generates a vector perpendicular to u. v. The pair (?7, w) thus form a plane perpendicular to the axis of rotation u (that’s as for the simple complex numbers C in which we have the Gaussian plane and perpendicular to it the axis of rotation!). Then finally: S q {p) = cos (9)p + sin {9)qp = cos {9)v + sin(6 ) )t7 (2.242) We fall back with on rotation based on a plane (but therefore be in space!) identical to that shown earlier above with the standard complex numbers C in the Gaussian plane. For more details the reader can refer the section of Spinor Calculus. So we know how to do any kind of rotation in space in a single mathematical operation and with a bonus: with the free choice of the axis! We can now better understand why the algebra of quaternions is not commutative. Indeed, the vector rotations of the plan are commutative but those of space are not like show us the example below: Given the initial configuration: info @ sciences. ch 147/5785 4. Arithmetic EAME v3. 5-2013 Y i. .JP-* Figure 4.15 - Starting situation for quaternion rotations Then a rotation about the X-axis followed by a rotation around the Y axis: is not equal to a rotation around the V'-axis followed by a rotation about the axis A" : Figure 4.17 - Example of non-equivalence for quaternion rotation The results will be fundamental for our understanding of spinors (see section Spinor Calculus)! 2.2.9 Algebraic and Transcendental Numbers Definitions (#31): Dl. We name "algebraic integer of degree n", any complex number that is a solution of an uni- variate algebraic equation of degree n, ie a polynomial of degree n (concept that we will discuss in the chapter of Algebra) whose coefficients are integers and whose dominant coefficient is equal to 1. D2. We name "algebraic number of degree n", any complex number that is a solution of an univariate algebraic equation of degree n, ie a polynomial of degree n whose coefficients are rational. The set of algebraic number is sometimes denoted by Q or A. 148/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Theorem 4.7. A first interesting result and particularly in this area of study (mathematical curiosity ...) is that a rational number is an "algebraic integer of degree n" if and only if it’s an integer (read several times need...). In scientific terms, we the say that the ring Z is "fully closed". Proof 4.7.1. We will assume that the number p/q , where p and q are two prime integers (that is to say that their ratio does not give an integer or more rigorously ... that the greatest common divisor of p, q is equal to 1 ! , is a root of the following polynomial (see section Calculus) with relativer integer coefficients (e Z) and whose dominant coefficient is equal to 1: x n CL n —iX n Oj\X -f- Uq (2.243) where the equality with zero of the polynomial is implicit. In this case: P n = -(a n _ip n 1 + . . . + aipq n 2 + a 0 q n l )q (2.244) Since the coefficients are by definition all integers and their multiple in the parenthesis also, then the parenthesis has necessarily a value in Z. Therefore, q (at the right of the parenthesis) divides a power of p (at the left of the equality), which is possible, in the set Z (because our bracket has a value in this same set for recall...), only if q is equal to ±1 (as they were prime together). So among all rational numbers the only that are solutions of polynomial equations with relative integer coefficients (e Z) for which the dominant coefficient is equal to 1 are relative integers! □ Q.E.D. To take another interesting and particular case, it is easy to show that any rational number is an algebraic number. Indeed, if we take the simplest following univariate polynomial: qx — p = 0 (2.245) where p and q are relatively prime and where q is different from 1. So as this is a simple polynomial with rational coefficients (e Q), after remaniment we have: x = P Q (2.246) So since p and q are relatively prime and q is different from 1, we have indeed that every rational number is an "algebraic number of degree 1". We also have the real (and irrational) number ■\f 7 l which is an "algebraic integer of degree 2" because it is the root of: x 2 - 2 = 0 (2.247) and the complex number i is also an "algebraic integer of degree 2" because it is the root of the equation: x 2 + 1 = 0 (2.248) info @ sciences. ch 149/5785 4. Arithmetic EAME v3. 5-2013 etc. Definition (#32): A "transcendental number" is a real or complex number that is not algebraic. That is, it is not a root of a non-zero polynomial equation with rational coefficients. Theorem 4.8. The set of all transcendental numbers is uncountable. The proof is simple and requires no difficult mathematical development. Proof 4.8.1. Indeed, since the polynomial with integer coefficients are countable, and since each of these polynomials has a finite number of roots (see the Factorization Theorem in the section Calculus), the set of algebraic numbers is countable! But the argument of Cantor’s diagonal (see section Set Theory) states that real numbers (and therefore also the complex numbers) are uncountable, so the set of all transcendental numbers must be uncountable. In other words, there is much more transcendental numbers than algebraic numbers... □ Q.E.D. The best known transcendent numbers are 7r and e. We are still looking to provide you a proof more nice and intuitive than that of Hilbert or Lindemann-Weierstrass. Here is a small summary of all the stuff see until now: R 'imaginary e/ 1 + » Complex C 1.5 - 2x1 \ e + xi ginary part Z N © /V 2 '>2 -2/ i 1 + / Algebraic A V2 + /V3 1.7 - 2.8/ 3 - 2/ x + iV 2 Transcen- dental ro E Natural N Integer Rational Z C Real Algebraic Ar Real R 0 0 1 2 -1 v 2 -i 3 -*3 V2 * -V3 e Irrational y 0 N z o A R Real part Figure 4.18 - Numbers Type N, Z, Q, M, €,... 150/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 2.2.10 Universe Numbers (normal numbers) Definition (#33): A "Universe number" also named "normal number" is a real number whose infinite sequence of digits in every base b is distributed uniformly in the sense that each of the b digit values has the same natural density 1/b. Intuitively this means that no digit, or (finite) combination of digits, occurs more frequently than any other. The set of Universe numbers is sometimes denoted U. While a general proof can be given that almost all purely real numbers are Universe numbers [ ] this proof is not constructive and only very few specific numbers have been shown to be Universe numbers. It is widely believed that the (computable) numbers \/2, i r, and e are Uni- verse numbers, but a proof remains elusive still in this year 2016. All of them however are strongly conjectured to be because of some empirical evidence. It is not even known whether all digits occur infinitely often in the decimal expansions of those constants. In particular, the popular claim "every string of numbers eventually occurs in 7r" or "the whole Holy book is contained in 7r is not known to be true. It has been conjectured that every irrational algebraic number is a Universe number, while no counterexamples are known, there also exists no alge- braic number that has been proven to be a Universe number in any base. More formally, let be a finite alphabet of b digits, and J2°° the set of all sequences that may be drawn from that alphabet. Let S € X ) 00 be such a sequence. For each a in X] let A/s (a, n) denote the number of times the letter a appears in the first n digits of the sequence S. We say that S' is a "simple Universe number" if the limit: lim n— >+oo N s (a,n) 1 n (2.249) for each a. Now let w be any finite string in ]T* and let N s (w , n) to be the number of times the string w appears as a substring in the first n digits of the sequence S (for instance, if S = 01010101..., then AS,- (010, 8) = 3). Then S' is a "Universe number" if, for all finite strings w £ lim 71— >■+ OO N s (a,n) n 1 b n (2.250) S is therefore a Universe number if all strings of equal length occur with equal asymptotic frequency. A given infinite sequence is either a Universe number or not, whereas a pure real number, having a different base-6 expansion for each integer b > 2, may be a Universe number in one base but not in another. A "disjunctive sequence" is a sequence in which every finite string appears. A Universe number sequence is a "disjunctive sequence" but a disjunctive sequence need obviously not be a Universe number. It is possible to prove (yet we don’t wish not present this proof in a book on applied math- ematics) with the "Universe number theorem" that almost all pure real numbers are Universe number. The set of non-Universe numbers, though "small" in the sense of being a null set, is "large" in the sense of being uncountable (for example o rational number is normal to any base, since the digit sequences of rational numbers are eventually periodic!). For instance, there are uncountable many numbers whose decimal expansion does not contain the digit 5, and none of these are Universe number. info @ sciences. ch 151/5785 4. Arithmetic EAME v3. 5-2013 2.2.11 Abstract Numbers (variables) Definitions (#34): A number may be considered as doing abstraction from the nature of the objects that constitute the group that it characterizes as well as how to codify it (Indian notation, Roman notation, etc.). We then say that the number is an "abstract number". In other words, an abstract number, is a number that does not designate the quantity of any particular kind of thing. Remark Arbitrarily, the human being has adopted a numerical system mainly used in the World and represented by the symbols 0, 1, 2, 3, 4, 5, 7, 8, 9 of the decimal system that will be supposedly known both in writing thant orally by the reader (language learning). V I y For mathematicians, it is not advantageous to work with these symbols because they represent only specific cases. What seek theoretical physicists and mathematicians are "literal relations" applicable in a general case and that engineers can according to their needs change these abstract numbers by numeric values that correspond to the problem they need resolve. These abstract numbers today commonly named "variable" or "unknown", used in the context of "literal calculation" are very often represented since the 16th century by: 1. The Latin alphabet: a, b, c, d, e, . . . , x, y, z; A, B,C, D, E, . . . , X , Y, Z where the first lower case letters of the latin alphabet ( a,b,c,d,e ...) are often used to represent an abstract constant, while the lowercase letters of the end of the latin alphabet z) are used to represent entities (variables or unknowns) we seek the value. 2. The Greek alphabet: A a Alpha AA Lambda Beta M/i Mu r 7 Gamma Nu Nu Ad Delta Xi Eee Epsilon Oo Omicron zc Zeta Il7r Pi H?7 Eta P P Rho Q9d Theta Etr Sigma It Iota T T Tau Ktc Kappa Tv Upsilon (ftp Phi x x chi Psi Omega Table 4.10 - Greek Alphabet which is particularly used to represent more or less complex mathematical operators (such as the index sum E, the indexed product II, the variational 5 , the infinitesimal element e, partial differential d, etc.) or variables in the field of physics (as u for the pulsation, v for the frequency, p for the density, etc.). 152/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 3. The modernized Hebrew alphabet (with less intensity...) As we have seen, a transfinite cardinal for example is denoted by the letter "aleph": J\f 0 . Although these symbols can represent any number there are some who can represent physical constants also named "Universal constant" as the speed of light c, the gravitational constant G, the Planck constant h, the number 7r, etc. We use very often still other symbols that we will introduce and define when reading this book. 2.2.11.1 Domain of a Variable A variable is therefore likely to take different numerical values. All these values can vary according to the character of the problem considered. Given two numbers a and b such that a < b, then: Definitions (#35): Dl. We name "domain of definition" of a variable, all numerical values it is likely to take between two specified limits (endpoints) or on a set (like N, M, M + , etc.). D2. We name "closed interval with endpoints a and b", the set of all numbers x between these two values and we denote as example as follows: [a, b] = {x G M. | a < x < b} (2.251) The left notation is named obviously "interval notation", the right one is named "set- builder notation". D3. We name "open interval with endpoints a and b ", the set of all numbers x between these two values not included and we denote it as example as follows: ]a, b[ = {x G M | a < x < b} (2.252) D4. We name "interval closed, left open right" or "semi-closed left" the following relation as example: [a, b[ = {x G M. | a < x < b} (2.253) D5. We name "interval open left, closed right" or "semi-closed right" the following relation as example: }a, b] = (x G M | a < x < b} (2.254) info @ sciences. ch 153/5785 4. Arithmetic EAME v3. 5-2013 Or in a summary and imaged form and as often denoted in Switzerland: Type Visual Math notation Explicitly [a,b\ a b f ] a < x < b Closed bounded interval [a,b[ a b r r a < x < b Semi-closed and bounded interval on a and semi-open on b (or left semi-closed and right semi-open) L L }a,b } a b a < x <b Semi-open bounded interval on a and semi-closed on b (or left semi- open and right semi-closed) J J }a,b[ a b a < x < b Bounded open interval 1 1 ] - oo ,6] b x < b Unbounded interval closed on b (or closed right) J ] - oo, 6[ b x < b Unbounded interval open on b (or open right) L [a, +oo[ a a < x Unbounded interval closed on a (or closed left) i }a, +oo[ a a < x Unbounded interval open on a (or open left) j Table 4.11 - Resume of main Combinatorial Analysis cases and according to the international norm ISO 80000-2: 2009 (since Switzerland has the art not respecting international norms and standards): Type Visual Math notation Explicitly [a,b] a b E ] a < x < b Closed bounded interval [a,b) a b r r a < x < b Semi-closed and bounded interval on a and semi-open on b (or left semi-closed and right semi-open) t t C a,b } a b a < x < b Semi-open bounded interval on a and semi-closed on b (or left semi- open and right semi-closed) J J ( a,b ) a b a < x < b Bounded open interval J L (-00,6] b 1 x < b Unbounded interval closed on b (or closed right) J (— oo, b[ b x < b Unbounded interval open on b (or open right) L [a, +oo) a a < x Unbounded interval closed on a (or closed left) i (a, +oo) a a < x Unbounded interval open on a (or open left) j Table 4.12 - Resume of main Combinatorial Analysis cases 154/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Remarks Rl. The notation {x such thata < x < b} denotes the set of real numbers x strictly greater than x and strictly less than b. R2. To fact that an interval is for example opened on b means that the real number b is not part thereof. By cons, if it had been closed then b would be part of it. R3. If the variable x can take all possible negative and positive values we write therefore: ]— oo, +oo[ where the symbol "oo" means "infinite". Obviously there can be combinations of open infinite right intervals with left endpoint and vice versa. R4. We will recall some of these concepts with a different approach when studying Algebra (literal calculation). V / We say that the variable x is an "ordered variable" if by representing its domain of definition by a horizontal axis where each point on the axis represents a value of x, then for each pair of values, we can say that that there is an "antecedent" and one that is a "subsequent". Here the notion of antecedent and subsequent is not related to the concept of time it expresses just how the values of the variable are ordered. Definitions (#36): Dl. A variable is said to be "increasing" if each subsequent value is greater than each an- tecedent value. D2. A variable is said to be "decreasing" if each subsequent value is smaller than each an- tecedent value. D3. The increasing and decreasing variables are named "variables with monotonic variations" or simply "monotonic variables". info @ sciences. ch 155/5785 Arithmetic Operators Talking about numbers like we did in the previous section naturally leads us to consider the operations of calculus. It is therefore logic that we make a non-exhau stive description of the operations that may exist between the numbers. This will be the goal of this section. We will consider in this book that there are two types of key tools in arithmetics (we do not speak of algebra but arithmetic!): • Arithmetic operators: There are two basic operators (addition "+" and subtraction " — from which we can build other operators: the "multiplication " (whose contemporary symbol x was introduced in 1574 by William Oughtred) and the "division" (whose old symbol was but since end of the 20th century we use simple the slash symbol). These four operators are commonly named "rational operators". We will see them more in details after setting the binary relations. Remark Rigorously addition could be enough if we consider the common set of real number M because therefore the subtraction is only the addition of a negative number. V I / • Binary operators (relations): There are six basic binary relations (equal =, different yC greater than >, less than <, greater or equal >, less than or equal <) that compare the order of amplitude of elements that are on the left and on the right of these relations (thus at the number of two, hence the name "binary") in order to draw some conclusions. The majority of binary relations symbols were introduced by Vieta and Harriot in the 16th century.. It is obviously essential to know as best a possible these tools and their properties before going through into more strenuous calculations. 3.1 Binary Relations Definitions (#37): Dl. Consider two non-empty sets E and F (see section Set Theory) not necessarily identical. If to some given elements x of A we can associate with a precise mathematical rule R EAME v3. 5-2013 4. Arithmetic (unambiguous) one element y of F, we define therefore a "functional relation" that maps E to F and that we write: R:E^F (3.1) Thus, more generally, a functional relation R can be defined as a mathematical rule that associates to given components x of E, some given elements y of F. So, in this more general context, if xRy, we say that there y is an "image" of x through R and that x is a "precedent" or "preimage" of y. The set of pairs (x, y) such that xRy is a true statement generates a "graph" or "represen- tation" of the relation R. We can represent these couples in a proper chosen way to make a graphical representation of the relation R. This is a type of relation on which we will come back in the section Functional Analysis under the form: R : fix) = yof and that does not interest us directly in this section. D2. Consider a non-empty set E, if we associate with this set (and only to this one!) tools to compare its items between them when we talk about a "binary relation" or "comparison relation" and that we write for any element x and y of A: xRy (3.2) These relations can also most of time be presented graphically. In the case of conventional binary operators comparison where A is the set of natural numbers N, relative Z, rationals Q or real M, that is graphically represented by a horizontal line (typically...); in the case of congruence (see section Number Theory) it is represented by lines in the plane whose points are given by the constraint of congruence. 3.1.1 Equalities It is difficult to define the term "equality" in a general case applicable to any situation. For our part, we will allow ourselves for this definition to take the inspiration of the extensionality theorem of Set Theory (discussed later in another section). Definitions (#38): Dl. Two elements are "equal" if and only if they have the same values. The strict equality is described by the symbol = that therefore means "equal to" (this symbol was introduced in 1557 by Robert Rocorde). If we have a = b and c is any given number (or vector/matrix) and * any operation (such as addition, subtraction, multiplication or division) then: a -k c = b -k c (3.3) This property is used to solve or simplify any type of equations. In practice, the abbrevi- ation "LHS" is informal shorthand for the left-hand side of an equality. Similarly, "RHS" is the right-hand side abbreviation of that latter . info @ sciences. ch 157/5785 4. Arithmetic EAME v3. 5-2013 Obviously we have (property of reflexivity): a = b b = a (3.4) And also (property of transitivity): We will not enumerate the other properties of the equaliy in the section (for more details see the section Set Theory). D2. If two elements are not strictly equal, that is to say "inequal"..., we are connecting them by the symbol ^ and we say they are "not equal". If we have a > b or a < b then: a 7 ^ b (3.5) There are still other equality symbols, which are an extension of two we have defined previously. Unfortunately, they are often misused (we could say rather that they are used in the wrong places) in most of the books available on the market (and this book is not an exception): 1. =: Should be used for congruence but in fact is mostly used to indicate an approxmation. 2. «: Should be used for approximations but in fact = is used instead. 3. =: Should be used to say that two elements are equivalent but in practice most people use 4. :=: Is used to say that one element is by definition equal to another one. 5. =: Should be used to say "equal by definition to" but in fact most people use instead :=. 6 . ~: Is used most of time in Statistics to say "follows the law..." but some practitioners use instead = or to say "asymptotically equal". 3.1.2 Comparators The comparators are tools that allow us to compare and order any pair of numbers (and also Sets!). The possibility of order numbers is fundamental in mathematics. Otherwise (if it was not pos- sible to order), there would be a lot of things that would shock our habits, for example (some of the concepts presented in the following sentence have not yet been presented but we would still make reference to them): no more monotonic functions (especially sequences) and linked to it the derivation would therefore indicate nothing more about the "variation direction", no more approach of roots of polynomial by dichotomy (classical research algorithm in an ordered set that split in two at each iteration), no more segments in geometry, no more than half space, no 158/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic more convexity, we can not oriented space anymore, etc. It is therefore important to be able to order things as you can see...! Thus, for any a, b, c € M. we write when a is greater than or equal to b: a ^ b (3.6) and when a is less than or equal to b: a < b (3.7) Remark It is useful to recall that the set of real numbers M. is a totally ordered group (see section Set Theory), otherwise we could not establish order relations among its elements (which is not the case for complex numbers C that we can not order!). v ! / Definition (#39): The symbol < is an "relation order" (see the rigorous definition further be- low!) which means "less than or equal to" and conversely the symbol > is also an order relation that means "greater than or equal to". We also have relatively to the strict comparison the following properties that are relatively intu- itive: and: if: if: and vice versa: We also have: and vice versa: < b and b < c =>- a < c > b and b > c a > c > b and b = c => a > c < b and b = c =>- a < c C b and c > 0 ac < be ■ < b + c and a < b => a — c ■ > b + c and a > b => a — c 1 1 0 < a < b — > — a b 1 1 b < a A c — < — a b (3.8) (3.9) (3.10) (3.11) (3.12) (3.13) (3.14) info @ sciences. ch 159/5785 4. Arithmetic EAME v3. 5-2013 We can obviously multiply, divide, add or subtract a term from each side of the relation as it is always true. Notice, however, that if you multiply both sides by a negative number it will obviously change as the comparator such that: a > b and c > 0 =>• ac < be (3.15) and vice versa: a < b and c < 0 =>• ac > be (3.16) We also have: 0 < a < b and p e M* + => a p < b p (3.17) Consider now that b < a < 0 and p e N* . Then if p is an even integer: 0 < a p < If (3.18) else if p is odd: a p >b p (3.19) This result simply comes from the multiplication of signs rule since the power when not frac- tional is only a multiplication. Finally: 0 < a < b and nef 'a < Vb (3.20) The relations > < € ^ < (3.21) thus correspond respectively to: (strictly) greater than, (strictly) smaller than, smaller or equal, greater or equal, much bigger than, much smaller than. These relations can be defined in a little more subtle and rigorous way and apply not only to comparators (see for example the congruence relation in the section of Set Theory)! Let us see this (the vocabulary that follows is also defined in the section of Set Theory): Definition (#40): Given a binary relation R of a set A to itself, a relation R on A is a subset of the cartesian product R C A x A (that is to say, the binary relation generates a subset by the constraints it imposes on the elements of A satisfying the relation) with the property of being: PI. A "reflexive relation" ifVx G A: xRx (3.22) P2. A "symmetrical relation" if Vx, y e A: xRy =>■ yRx (3.23) 160/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic P3. An "anti-symmetrical relation" if Vat, y G A: ( xRy and yRx ) x = y (3.24) P4. A "transitive relation" if Vx, y,z E A\ (. xRy and yRz) =>■ xRz (3.25) P5. An "connex relation" if Vx, y E A: Vx,yeA=>xRy or yRx (3.26) Mathematicians have given special names to the families of relations satisfying some of these properties. Definitions (#41): Dl. A relation is named "strict order relation" if and only if it is only transitive (some specify then that it is necessarily antireflexive but this last fact is then obvious...). D2. A relation is named a "preorder" if and only if it is reflexive and transitive. D3. A relation is named an "equivalence relation" if and only if it is reflexive, symmetric, and transitive. D4. A relation is named "order relation" if and only if it is reflexive, transitive and antisym- metric (thus the relations >, < are not order relations because obviously not reflexive relations). D5. A relation is named "total order relation" if and only if it is reflexive, transitive, connex and antisymmetric. For the other combinations it seems (as far as we know) that there are no special name among the mathematicians ... Remark The binary relations have all similar properties in natural sets N, relative Z, rational Q and real M (there is no natural order relation on the set of complex numbers C). V If we summarize: Binary relation = + > < < reflexive yes no no no yes yes symmetric yes yes no no no no transitive yes no yes yes yes yes connex no no no no yes yes antisymetric yes no no no yes yes Table 4.13 - Binary Relations info @ sciences. ch 161/5785 4. Arithmetic EAME v3. 5-2013 Thus we see that the binary relations <,> form with the previously mentioned sets, total order relations and it is very easy to see which binary relations are partial, total or equivalence order relations. Definition (#42): If R is an equivalence relation on A. For V.r; e A, the "equivalence class" of x is by definition the set: x {y e A: xRyj (3.27) [x] is therefore a subset of /l (x C A) which we denote also thereafter ... R (so be careful not to confuse in what follows the equivalence relation and the subset itself...). We thus have a new set that is named the "set of equivalence classes" or "quotient set" denoted in this book by A/R. So: A/R= {[x]|x G A} \ ) (3.28) You should know that in A/R we do not look anymore at [x] as a subset of A, but as an element! An relation of equivalence, presented in a popularized manner... thus serves to stick one unique label to items that satisfy the same property, and to confuse them with the said label (knowing what we do with this label). 162/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic ^Example: In the set of integers Z, if we study the remains of the division of number by 2, we have that the result is always 0 or 1. The zero equivalence class is then named the "set of even integers numbers", the one equivalence class is therefore named the "set of odd integers". So we have two classes of equivalence for two partitions of Z (always keep in mind this simple example for theoretical elements that follow it helps a lot!). If we name the first 0 and the second 1, we fall back on the operation rules between odd and even numbers: 0 + 0 = 0 0 + 1 = 0 1 + 1 = 0 (3.29) which respectively means that the sum of two even integers is even, that the sum of an even and an odd integer is odd and that the sum of two odd integer is even. And for the multiplication: 0x0 = 0 0x1 = 0 1x1 = 1 (3.30) which respectively means that the two product of two even integer is even, the product of an even and an odd integer is even and that the product of two odd integer is odd. Now, to verify that we are dealing with an equivalence relation, we should still check that it is reflexive ( xRx ), symmetrical (if xRy then yRx) and transitive (if xRy and yRz then xRz). We will see how to check it a few paragraphs further below because this example is a very special case of congruence relation. Definition (#43): The application / : A H> A/R defined by x [x] is named "canonical projection". Any element x G [x] is therefore named "class representative" of [x] . Theorem 4.9. Now consider a set E. Then we propose to proved that there is correspondence between the set of equivalence relations on E and all partitions of E. In other words, this theorem says that an equivalence relation on E is nothing more but a partition on E (this is intuitive). Proof 4.9.1. Let R be an equivalence relation on E. We choose / = E / R as set partition indexing and all we ask for any [x] G E/ R, /+ T ] = [x] . We just have to check the following two properties of the definition of partitions to show that the family (E^) is a partition of E: PI. Given [x], [y\ G E/R such that [x] ^ [y\ then (obvious) E^ n £+] = 0. P2. E — |J is obvious because if x G E then x G [x] = £++ [x\eE/R info @ sciences. ch 163/5785 4. Arithmetic EAME v3. 5-2013 □ Q.E.D. Again, it should by easy to check with the practical example of the division by 2 given previ- ously that the partition of even and odd numbers satisfies these two properties (if not reader can contact us we will add this as an example). We have therefore associated to the equivalence relation R a partition E. Conversely, if ( E , ) r is a partition of E then we almost easily verify that the relation R is defined by xRy if and only if there exists j 6 / such as x,y G E, is an equivalence relation! Both applications are thus bijective and the inverses of each other. 164/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic ^Example: We will now apply an example a little less trivial than the last we have seen to the construction of rings Z/Z after a few reminders equation (for the concept of ring see the section Set Theory). Reminders: 1. Given two numbers n, rn e Z. We say that "n divides m" and we write n\m if and only if there exists an integer k e Z such as rn = kn (see section Numbers Theory). 2. Given d > 1 is an integer. We define the relation R by nRm if and only if d\ (n — rn) or in other words nRm if and only if there exists d G Z such that n = m + kd. Usually we write this n = m (modulo d) instead of nRm and we say that "n is congruent to m modulo d". Remember also that n = 0 (modulo d) if and only if d divides n (see section Numbers Theory). We will now introduce an equivalence relation on Z. Let us prove that for any integer d > 1, the congruence modulo d is an equivalence relation on Z (we have already proved this in the section of Number Theory in our study of congruence but let us redo this work for the fun...). To prove this we simply have to control the three properties of the equivalence relation: PI. Reflexivity: n = n since n = n + 0 d. P2. Symmetry: If n = m then n — m + kd and therefore m = n + {—k)d that is to say m = n. P3. Transitivity: If n = m and then mj then n — m + kd and m — j + k'd therefore n = j + (k + k')d that is to say n = j. In the above situation, we denote by Z/riZ the set of equivalence classes and we will deonte by [n]d the equivalence class of congruence of a given integer n given by: [n]d = {..., n — 2 d, n — d,n,n + d, n + 2 d, n + 3 d, . . .} (3.31) (each difference of two values in the braces is divisible by d and this is therefore an equivalence class), thus: Z/dZ = { [0] d , [l] d , [2] d , . . . , [d - 1]4 (3.32) In particular (trivial since we obtain thus the all Z): Z/2Z = { [0] 2 , [1] 2 } (3.33) info @ sciences. ch 165/5785 4. Arithmetic EAME v3. 5-2013 Remark The operations of addition and multiplication on Z define also the operation of addition and multiplication on Z/dZ. Then we say that these operations are compatible with the equivalence relation and then form a ring (see section Set Theory). V / 166/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 3.2 Fundamental Arithmetic Laws As we have said before, there is a fundamental operator (addition) from which we can define multiplication, subtraction (provided that the chosen Numbers Set is adapted to it....) and divi- sion (provided that the chosen Numbers Set is also adapted to it....) and around which we can build the entire Analytical Mathematics. Obviously there are some subtleties to be considered when the level of rigour increase. The reader can then refer to the section of Set Theory where fundamental laws are redefined more accurately than what will follow. 3.2.1 Addition Definition (#44): The addition of integers is an operation denoted "+" which has for only purpose to bring together in one number all the units contained in several others. The result of the operation is named the "sum", the "total" or "cumul". The numbers to be added are named therefore "terms of the addition". Thus, A + B + C... are the terms of the addition and the result is the sum of the terms of the addition. Or in schematic form of a special case: 0+4+3=4+3=7 info @ sciences. ch 167/5785 4. Arithmetic EAME v3. 5-2013 4 +3 < — l — i — i — i — i — i — i — i — i — i — i — i — i ► 0 12 34 5 678 9 10 11 12 Figure 4.19 - One possible schema for addition Here is a list of some intuitive properties that we assume without proofs (as in fact they are axioms) of the operation of addition: PI. The sum of several numbers do not depend on the order of terms. Then we say that the addition is a "commutative operation". This means concretely for any two numbers: A + B = B + A (3.34) P2. The sum of several numbers does not change if we replace two or more of them by their intermediate result. Then we say that the addition is an "associative operation": (A + B) + C = A + (B + C) (3.35) P3. The Zero is the neutral element of addition because any number added to zero gives that number: A + 0 = 0 (3.36) P4. Depending on the set in which we work (Z, Q, M, ...), the addition may include a term in such a way that a sum is zero. Then we say that there exists an "opposite" to the sum such as: A + A = 0 (3.37) We have define more rigorously the addition using the Peano axioms in the particular case of all natural numbers N as we have already see in the section Numbers. So, with these axioms it is possible to prove that there exists one and only one application (uniqueness), denoted "+" of N x N in N satisfying: Vn 6 N : n + 0 = n Vp G N : Vg G N, p + s(q) = s(p + q) (3.38) Vn G N, s(n) — n + 1 where S means "successor". Remark As this book has not be written for mathematicians, we will pass the proof (relatively long and of little interest in the case of business) and we will assume that the application "+" exists and is unique ... and that it follows from the above properties. V / 168/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Let xi, x-2- .... x n be any numbers then we can write the sum as following: n Xi + x 2 + • • • + x n = Xi 2=1 (3.39) by defining upper and lower bound to the indexed sum (below and above the upercase greek symbol Sigma). Here are some properties relatively to this condenses notation that should be obvious (if not the reader can send us a request we will add the details): n n n n n n J2 kx i= kJ2xi X k = nk X( X * + Vi) = X x i + X Vi (3-40) 2=1 2=1 2=1 2=1 2=1 2=1 where A; is a constant. Let us see now some concrete examples of additions of various simple number in the purpose to practice the basis: ^Examples: The addition of two numbers relatively small is quite easy since we have learn by heart to count to a number resulting of the operation. Therefore (examples taken on decimal basis): 5 + 2 (3.41) 7 and: 10 + 3 (3.42) 13 and: 1014 + 3 (3.43) 1017 For more bigger number we can adopt another method that human must also learn by heart. For example: 9244 + 3475 ? (3.44) The algorithm (process) is therefore the following: We add the columns (4 columns in this example) from right to left. For the first column we have therefore 4 + 5 = 9 this gives: info @ sciences. ch 169/5785 4. Arithmetic EAME v3. 5-2013 9244 + 34 75 (3.45) 9 and we continue like this for the second column where we have 4 + 7 = 11 at the difference that now we have a number > 10, then we report the first left digit on the next (left) column for the addition. Therefore: 9 2 +1 44 + 3 4 75 (3.46) 19 The third column we be calculated therefore asl + 2 + 4 = 7 which give us: 9 2 +1 44 + 3 4 75 (3.47) 7 19 For the last column we have 9 + 3 = 12 and once again we report the first digit from the left on the next column of the addition. Therefore: +1 9 2 +1 44 + 3 4 75 (3.48) 2 7 19 Finally: +1 9 2 +1 44 + 3 4 75 (3.49) 12 7 19 This example show how we can proceed for the addition of any real numbers: we do an addition column by column from the right to the left and if the result of one addition is greater than 10, we report the left digit on the next (left) column. This algorithm (process or methodology) of addition is quite simple to understand and to exe- cute. We will not go further on this subject add this day. 170/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 3.2.2 Subtraction Definition (#45): Subtraction is a mathematical operation that represents the operation of re- moving objects from a collection. More formally the subtraction of the number A by the number B denoted by the symbol " — " consist in founding the number C such that added to B gives A. Remark ^ 1 As we saw it in the section of Set Theory the subtraction in the set N could be possible 1 ^ only if A > B. J Formally we write an inline literal subtraction in the form: A — B = C (3.50) That must satisfies: A = B + C (3.51) Or in schematic form of a special case: 10-3-4 = 7-4 = 3 -4 -3 / Y i •«— i — i — i — i — i — i — i — i — i — i — i — i — i — ► 0 12 34 5 678 9 10 11 12 Figure 4.20 - One possible schema for subtraction Here are some intuitive properties that we assume without proof for the subtraction operation (as it can be deduce from the addition...): PI. The subtraction of several numbers depends on the order of the terms. We say when than subtraction is a "non-commutative operation". Indeed: 5 — 2 ^2 — 5 (3.52) P2. The subtraction of several numbers change if we replace two or more of them by their intermediate result. We say when the subtraction is a "non-associative operation". Indeed: 5 — (3 — 2) ^ (5 — 3) — 2 (3.53) P3. The zero is not the neutral element of subtraction. Indeed, any number to which we subtract zero gives the same number, so zero is neutral on the right .. but not left because any number we subtract to zero does not give zero! We then say "neutral on the right" in the case of subtraction. Indeed: that the zero is only 0-5^0 (3.54) info @ sciences. ch 171/5785 4. Arithmetic EAME v3. 5-2013 In most complicated cases we have a special vocabulary: -1 7 0 4 5 12 19 2 carry Minuend Subtrahend Rest or Difference (3.55) The "minuend" is 704, the "subtrahend" is 512. The minuend digits are m 3 = 7, m 2 = 0 and mi = 4. The subtrahend digits are s 3 = 5, s 2 = 1 and s i = 2. Beginning at the one’s place, 4 is not less than 2 so the difference 2 is written down in the result’s one place. In the ten’s place, 0 is less than 1, so the 0 is increased by 10, and the difference with 1, which is 9, is written down in the ten’s place. The American method corrects for the increase of ten by reducing the digit in the minuend’s hundreds place by one. That is, the 7 is struck through and replaced by a 6. The subtraction then proceeds in the hundreds place, where 6 is not less than 5, so the difference is written down in the result’s hundred’s place. We are now done, the result is 192. Let us see now some concrete examples of additions of various simple number in the purpose to practice the basis: ^Example: The subtraction of two relatively small numbers is pretty easy once count to at least the number resulting from this operation. So: 5 we memorized to - 2 3 and: 10 (3.56) - 3 7 and: 1014 (3.57) 3 1011 (3.58) For larger numbers another possible method must be learned by heart (as well as for the addition). For example: 4574 - 3785 ? (3.59) 172/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic we subtract the columns (4 columns in this example) from right to left. In the first column we have 4 — 5 = — IcOsowe report —1 to the next column (second one) and we write 10 — 1 = 9 below the horizontal line of the first column: 45 7” 1 4 - 37 8 5 (3.60) 9 and we continue as well for the second column 7 — 8 = — 1 < 0 so that we report — 1 on the next column (third one) and as —1 — 1 = —2 we report 10 — 2 = 8 below the horizontal bar of the second column: 4 5 -1 7 _1 4 - 3 7 8 5 (3.61) 8 9 The third column is calculated as 5 — 7 = —2 < 0 and we report —1 on the next column (fourth one) and as —1 — 2 = —3 we report 10 — 3 = 7 below the line of the third column bar: - 3 7 8 5 (3.62) 7 8 9 In the last column we have 4 — 3 = 1 > 0 therefore we report the nothing on the next column and as 1 — 1 = 0 we report 0 below the line of the fourth column bar: - 3 7 8 5 (3.63) 0 7 8 9 That’s how we therefore we proceed to subtracting any numbers. We make a subtraction by column from the right to the left and if the result is a subtraction is less than zero we report — 1 to the next column and the addition of the latest report on the subtraction obtained below the line. We have when we mix the addition and subtraction the following resulting relation that should be obvious for most readers: a + (b — c) = (a + b) — c a- (b + c) = (a-b) -c n \ ( n , (3.64) a — [b — c) = (a — b) + c a — b = (a — c) — (b — c) The methodology used for subtraction being based on exactly the same rules that for addition we will expand the subject more as this seems actually useless in our point of view. This method is very simple and of course requires some habits to work with numbers to be fully understood and mastered. info @ sciences. ch 173/5785 4. Arithmetic EAME v3. 5-2013 3.2.3 Multiplication Definition (#46): The multiplication of numbers is an operation that has for purpose, given two numbers, one named "multiplier" m, and the other "multiplicand" M, to find a third number named "product" P that is the sum (multiplication is only a successive number of sums!) as many equal numbers to the multiplicand as there are units multiplier: m m x M = M + M + M +... + M = Y M = P (3.65) (1) (2) (1) (m) ^ The multiplicand and multiplier are named "product factors". The multiplication is indicated in kindergarten by the symbol " x " of of the elevated dot symbol in higher classes or even when there is no possible confusion... without anything: axb = a- b = ab \ / (3.66) We can define the multiplication using the Peano axioms in the special case of natural numbers N as we have already mentioned in the sectionNumbers. Thus, with these axioms it is possible to prove that there is (exists) one and only one (unique) application, denoted " x " or more often of of N 2 to N satisfying: Vn G N, n ■ 0 = 0 ' (3 67) Vp G N, Vg G N, p(q + 1) = pq + q Remark As this book has not be written for mathematicians, we will pass the proof (relatively long and of little interest in the case of business) and we will assume that the application "x" exists and is unique ... and that it follows from the above properties. \ / The power is a specific notations of a special case of the multiplication. When to multipli- cand^) and the multiplier(s) are typically identical in numerical values, we denote therefore the multiplication by (for example): n ■ n ■ n ■ n ■ n ■ n ■ n ■ n = n 8 (3.68) This is what we name the "power notation" or "exponentations". The number in superscript is what we the name the "power" or the "exponant" of the number. The notation with exponants is said to be see for the first time in a book of Chuquet in 1484. You can check by yourself that is properties are the following (for example): n x n y = r f+y (3.69) and also: a x b x = ( ab) x (3.70) Here are some obvious properties about the multiplication that we will admit without proof (this is a Set properties point of view listing): 174/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic PI. The multiplication of several numbers does not depend on the order of terms. Then we say that multiplication is a "commutative operation". P2. The multiplication of several numbers does not change if we replace two or more of them by their intermediate result. We then say that the multiplication is an "associative operation". P3. The unit is the neutral element of the multiplication as any multiplicand multiplied by the multiplier 1 is equal to the multiplicand itself. P4. The multiplication may have a term such that the product is equal to unity (the neutral element). Then we say that there exists a "multiplicative inverse" (but this depends strictly speaking in what set of numbers we work as in some the concept of decimal number does not exist!). P5. Multiplication is a "distributive operation", that is to say: a ■ {b + c) = ab + ac (3.71) the reverse being named a "factorization operation". Let us also introduce some special notations for the multiplication: 1. Given any numbers aq, x 2 , ..., x n (not necessarily equal) then we can write the product as following: xi ■ x 2 • . . . • x n = n Xi i = 1 (3.72) by defining upper and lower bounds to the indexed product (above and below the upper- case Greek letter "Pi"). We trivially have respectively to the latter notation (on request we can detail more...): for any number k such that: We also have for example: Y[ kxi = k n Xi 2=1 2—1 n k=k n 2=1 n(:r + 2 /) = (: x + y) n 2=1 (3.73) (3.74) (3.75) 2. We define the "factorial" simply ("simply"... because it exists also a more complex way of defining it through the Euler Gamma function as it is done in the section of Integral and Differential Calculus) by: r > Ix2x3x4x---xn = n! c J (3.76) info @ sciences. ch 175/5785 4. Arithmetic EAME v3. 5-2013 with the special fact that (only the complex definition mentioned before can make this fact obvious...): 0! = 1 (3.77) Let us see some simple examples of basic multiplications: ^Example: El. The multiplication of two relatively small numbers is fairly easy once we have memorized count to at least the number resulting from this operation. So: 5 10 1014 x 2 x 3 x 3 (3.78) TO 30 3042 E2. For much larger numbers we must adopt another method that has to be memorized. For example: 4574 x 8 32 56 40 32 (3.79) This methodology is very logical if you understand how we build a a number in base ten. Thus we have (we’ll assume that the distributive property is mastered): 8 x 4574 = 8 x (4 • 10 3 + 5 • 10 2 + 7 • 10 1 + 4 • 10°) = 8 x 4000 + 8 x 500 + 8x 70 + 8x4 „ , n (3-80) = 32 • 10 3 + 40 • 10 2 + 56 • 10 1 + 32-10° = 36592 To avoid overloading the notations in the multiplication by the "vertical" method, we do not represent the zeros that would overload unnecessarily the calculations (and even more if the multiplier and / or the multiplicand are very large numbers). 176/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 3.2.4 Division Definition (#47): The division of integers (to start with the simplest case ...) is an operation, which aims, given two integers, one named "dividend" D, the other named "divider" d , to find a third number named "quotient" Q which is the largest number whose product by the divisor can be subtracted (so the division result of the subtraction!) the dividend (the difference being named the "rest" R or sometimes the "congruence"). Remark In the case of real numbers there are never any rest at the end of the division operation (because the quotient multiplied by the divisor gives always exactly the dividend)! V / Generally in the context of integers (or algebraic equation division), if we denote by D the dividend and by d the divisor, the quotient Q and the remainder R we have the relation: D = Q ■ d + R (3.81) knowing that the division was initially written as follows: D:d=^~ (3.82) d We indicate the operation of division by placing between the two numbers, the dividend and the divider, a symbol or a slash "/" or even in kindergarten with the symbol We refer also often by the term "fraction" (instead of "quotient"), the ratio of two numbers or in other words, the division of the first by the second. Remark The sign of division is said to be due to Gottfried Wilhelm Leibniz. The slash symbol could have been see for the first time in the works of Leonardo Fibonacci (1202) and is probably due to the Hindus. \ If we divide two numbers and we want an integer as quotient and as remainder (if there is one...), then we speak of "euclidiean division". For example, dividing a cake, is not a Euclidean division because the quotient is not an integer, except if one takes the four quarters ...: info @ sciences. ch 177/5785 4. Arithmetic EAME v3. 5-2013 a cake divided in four parts... in gray three quarts of a cake... Figure 4.21 - Schematic example of a division (fractions) If we have: D : d = D ■ — - — D ■ in (3.83) a we name i D the inverse of the dividend. At any number is associated an inverse that satisfies this condition. From this definition it comes the notation (with x being any number other than zero) - = x 1 ■ x 1 = x 1 1 = x° = 1 (3.84) x In the case of two fractional numbers, we say they are "inverse" or "reciprocal", when their product is equal to unity (as the previous relations). Remarks Rl. A division by zero is what we name a "singularity". That is to say the result of the division is: undetermined!! R2. When we multiply the dividend and the divisor of a division (fraction) by a same number, the quotient does not change (this is an: "equivalent fraction"), but the remainder is multiplied by that number. R3. Divide a number by a product made of several factors is equivalent to divide this number successively by each of the factors of the product and vice versa. R4. Fractions that are greater than 0 but less than 1 are named "proper fractions". In proper fractions, the numerator is less than the denominator. When a fraction has a numerator that is greater than or equal to the denominator, the fraction is an "improper fraction". An improper fraction is always 1 or greater than 1. And, finally, a mixed number is a combination of a whole number and a proper fraction. V / The properties of the divisions with the condensed power notations (exponentiation) are typi- cally as example (we will leave to the reader the fact to check this up to with numerical values): X ■ X ■ X y-y = x 3 ■ y 2 (3.85) 178/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic or obviously another example: nr* . nr* . nr* nr* nr* iX_/ a/ «X/ 4/ o o -| = — ■ — •x = l- l- x = or-a; =x = x a; • x x x We therefore deduce that: p-q xi (3.86) (3.87) Let us recall that a prime number (relative integer Z) is a number greater than 1 that has for divisors only itself and unity (remember that 2 is prime for example). Therefore any number that is not prime has at least one prime number as a divisor (except 1 by definition!). The smallest divisors of an integer is a prime number (we will detail the properties of prime numbers relatively to the operation of division in the section Numbers Theory). Let us see some properties of the division (some of us are already known because they arise from logical reasoning of the multiplication properties): a c — — - ^ a ■ a — b ■ c b d a -a a ^b = ~Y = ^b a c a ■ d + b ■ c b + d = ~ a c a b~~ = ~ b-d d a ■ d d b ■ c a ■ d ad a a b-d b d b b a b a b c b a c d b a ■ c b^~d a h -b = a b where: - = - -v^ ad = be b d is what we we name a "terms amplification" and: a c a T c b + b = b is an operation consisting by putting everything with a "common denominator". (3.88) (3.89) (3.90) We also have the following properties: PI. The division of several numbers depend on the order of terms. We then say that the division is a "non-commutative operation". This means we have when a that is different from b and that both are different from zero: a b b a (3.91) P2. The result of the division of several numbers change if we replace two or more of them by their intermediate result. We then say that the division is a "non-associative operation": c (3.92) info @ sciences. ch 179/5785 4. Arithmetic EAME v3. 5-2013 P3. The unit is the neutral element has that we multiply the divident or the divider by 1 the result of the division remains the same. a 1 • a a a b ~ ~T~ ~ lT b ~ ' b (3.93) P4. The division may include a divider in such a way that the division is equal to unity (neutral element 1). We then say that there exist a "symmetrical to the division" that is obviously equal to the numerator (dividend) itself. P5. The incrementation of numerator and denominator by a constant value is not equal to the initial ratio in the general case where a ^ b: a b a + c te b + c te (3.94) Now that we know the multiplication (and therefore power notation) and division, if we consider a and b are two positive real numbers, different from zero we have: p-times g-times and (named sometimes the "zero exponent rule of exponents"): a ■ a . . . ■ a cr p-times a q a ■ a . . . ■ a = a ■ a . . . • a = aP q a~ q = - g-times p— g-times l a q and: We have also obviously: a ■ a . . . ■ a a n a ■ a . . . ■ a = a- a...-a = a u = 1 n— n-times n-times (3.95) (3.96) (3.97) (3.98) Also: ( n\ , m [a = a • a . . . • a - . . .a • a . . . • a a • a . . . • a = a rnn S. v ^ ran- times ra -times (3.99) 180/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 3.2.4. 1 n-root Now that we have introduce in a simple and not too much formal way the operations of multiplication (and power notation) and division we can introduction the concept of //-root. As we know for example that: 2 3 2 2 = 2 3+2 we can by reverse inference for example also write: 2i __ 20-5+0.5 _ 2 O. 520 . 5 _ 2 1 l 2 2 l l 2 (3.100) and therefore it means that fractional power exist! This is what we name n-root (in the above example we speak of 2-root). We can now define the principal n-root of any number! Definition (#48): In mathematics, the nth root of a number a, where n is a positive integer, is a number r which, when raised to the power n yields x. That is to say such that: r n = x, where n is the degree of the root. By convention we write: r = x 1/n : = (3.101) Roots are usually written using the "radical" symbol ^/TTT or also named the "radix". The number n G N is named the "radicand" and sometimes the "index". From what has been said for the powers, we can easily conclude that the n-th root of a product of several factors is the product of n-th roots of each factor: ■ y/b = y/a-b P/b (3.102) as (seen previously): (aPY = aP' q = ( a q Y (3.103) And therefore: a i = pfaP = {p/a) p and Pfpfa = v pfa (3.104) Obviously it comes: n-times (3.105) We also have if a < 0: (3.106) if n G N* is odd and: (3.107) info @ sciences. ch 181/5785 4. Arithmetic EAME v3. 5-2013 if n G N* is even. If x < 0 and n € N* is odd then: is the number y such that: y = a 1 /" = ^ b n = x (3.108) (3.109) If n G N* is even then obviously, as we already have seen it earlier, the root belong to C (see section Numbers). If the denominator of a fraction contains a factor of the form v a k with a ^ 0. by multiplying the numerator and denominator by \J a n ~, we will remove the root of the denominator, since: x i k \Ja n ~ k x \J a n ~ k \J a k a n ~ k x \J a n ~ k x \J a n ~ k x \J a n ~ k <y a k+n ~ k \/a™ |a| (3.110) ^Example: Let us see a world famous example of the application of the root about the origin of the ISO paper formats: A6, A5, A4, A3, A2, Al, A0, etc. This format of paper has in fact the property (there is a goal at the origin!) to keep the proportions when we bend or cut the sheet in half in its largest dimension. Thus, if we denote by L the length and W the width of the sheet, we have: L W , l2 w = -l^ l ~ = 21 2 (3.111) Hence we have: L = \f2W (3.112) As the A0 format by definition has an area of 1 [m 2 ] . For this format we have then: LW = 1 [m 2 ] (3.113) Therefore we deduce that: LW = V2\ H 2 = 4= = 1 K] v2 (3.114) and therefore: W = 2' 1/4 • 1 [m] = 84.1 [cm] (3.115) from whence we derive: L = 2~ 1/4 ■ 1 [m\ = 118.9 [cm] (3.116) 182/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 3.3 Arithmetic Polynomials Definition (#49): An "arithmetic polynomial" (not to be confused with "algebraic polynomial" that will be studied later in the section Algebra) is a set of numbers separated from each other by the operators of addition or subtraction (+ or — ) including therefore the multiplication... The components enclosed in the polynomial are known as "terms" of the polynomial. When the polynomial contains a single term, then we speak of "monomial", if there are two terms we speak of "binomial", and so on... Theorem 4.10. The value of an arithmetic polynomial is equal to the excess of the sum of the terms preceded by the + sign on the sum of the terms preceded by the sign — . Proof 4.10.1. 77-1 — 77-2 + 77 3 - 77 4 + 77 5 - 77 6 + . . . - 77j_i + 77 j = (77i + ?7 3 + 77 5 + . . . + 77 j) + ( 1) (t7 2 + ?7 4 + 77 6 + . . . + 77j_i) (3.117) whatever the values of the terms. □ Q.E.D. Highlight the negative unit —1 is what we name, as we already know, a "factorization". The reverse operation is named as we also already know a "distribution" or "development". The product of several polynomials can always be replaced by a single polynomial that we name the... "resulting product". We usually operate as follows: we multiply successively all the terms of the first polynomial, starting from the left, with the first, the second, ..., the last by the second polynomial. We obtain a first partial product. We do, if necessary, a reduction (simplification) of similar terms. We then multiply each of the terms of the partial product successively by the first, the second, ..., the last term of the third polynomial starting from the left and so on. ^Example: Pi ■ P 2 ■ P 3 (a + b + c)(d + e + f)(g + h + i) a(d + e + f)(g + h + /) + b(d + e + f)(g + h + i) c(d + e + f)(g + h + i) ad(g + h + i) + ae(g + h + i) + af(g + h + i) + bd(g + h + i) be(g + h + i) + bf(g + h + i) + cd(g + h + i) + ce(g + h + i) cf{g + h + i ) adg + adh + adi + aeg + aeh + aei + afg + afh + afi + bdg + bdh bdi + beg + bell + bei + bfg + bfh + bfi + cdg + cdh + cdi + ceg + cell cei + cfg + cfh + cfi (3.118) info @ sciences. ch 183/5785 4. Arithmetic EAME v3. 5-2013 The product of the polynomials P\, A, A, ..., Pk, ... is the sum of all products of n r factors formed with a term of A, of a term of A, .... and a term of Pk and so. if there is no reduction, the number of terms is equal to the product of the numbers of terms of each polynomial such that the final number of therms is equals to: n n = \[n i (3.119) 2=1 3.4 Absolute Value Definition (#50): In mathematics, the "absolute value" |a;| of a real number x is the non-negative value of x without regard to its sign. Namely, \x\ = x for a positive X 9 1 3 / 1 — X for a negative x (in which case — x is positive), and |0| =0. For example, the absolute value of 3 is 3, and the absolute value of —3 is also 3. The absolute value of a number may be thought of as its distance from zero. Remarks Rl. The term absolute value has been used in this sense from at least 1806. The notation \x\, with a vertical bar on each side, was introduced by Karl Weierstrass in 1841. R2. For plots about the absolute value the reader is referred to the Functional Analysis section of this book. V / For any real number x, the "absolute value" x, is formally given by f + x if x > 0 — x if x < 0 0 if x = 0 At the origin the absolute value was defined as: \x\ = VxP We notice that also the following possible notation: |x| = max(-i, x) And the equivalent expressions: x ^ |x| | — x\ = |x| and also: \x\ ^ y — y ^ x ^ y (3.125) \x\ ^y<^x^—y\/x^y (3.126) (3.122) (3.123) (3.124) (3.120) (3.121) 184/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic the latter being often used in the context of solving inequalities. ^Example: Solving an inequality such that: \x — 3| < 9 (3.127) is then solved simply by using the intuitive concept of distance. The solution is the set of real numbers whose distance from the real number 3 is less than or equal to 9. This is the range of center 3 and radius 9 or formally: [3 - 9, 3 + 9] = [-6, 12] (3.128) Let us indicate that it is also useful to interpret the term: x-y | = \f{x - y) 2 / (3.129) as the (euclidean!) distance between the two numbers x and y on the real line. Thus, by providing the set of real numbers of the absolute value distance , it becomes a metric space (see the section of Topology to have a robust introduction to what is a distance) ! ! ! The absolute value has some trivial properties that we will give without proof (excepted on reader request) as they seem to us quite intuitive: The absolute value has the following four fundamental properties: PI. Non-negativity: x\ > 0 (3.130) P2. Positive-definiteness: x x = 0 (3.131) P3. Multiplicativeness: xy | = \x\\y\ P4. Subadditivity ("first" triangle inequality): \x + y\ < |x| + \y\ (3.132) (3.133) Other important properties of the absolute value include: P5. Idempotence (the absolute value of the absolute value is the absolute value): l(M)l = \x (3.134) info @ sciences. ch 185/5785 4. Arithmetic EAME v3. 5-2013 P6. Evenness (reflection symmetry of the graph): I — x\ = lx P7. Preservation of division (equivalent to multiplicativeness) if y ^ 0: x \x\ y \y\ P8. Reverse ("second") triangle inequality (equivalent to subadditivity): \x-y\> |(|x| -y |)| (3.135) (3.136) (3.137) 186/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 3.5 Calculation Rules (operators priorities) Frequently in computing (in development in particular), we speak of "operators precedence". In mathematics we speak of "priority of the sets of operations and rules of signs". What is this exactly? We have already seen what are the properties of addition, subtraction, multiplication, division and power. We therefore insist that the reader distinguishes the concept of "property" of this of "priority" (that we will immediately see) which are (obviously) two completely different things! In mathematics, in particular, we first define the priorities of the symbols {[()]}: 1. Operations that are in brackets () should be performed first in the polynomial. 2. Operations that are in brackets [ ] should be made afterwards from the results of operations that were in brackets (). 3. Finally, from the intermediate results of operations that were in () and brackets [], we calculate the operations that are between the braces {}. Let us do an example, this will be more telling. ^Example: Consider the calculation of the polynomial: {[5 • (8 + 2) + 3 • [4 + (8 + 6) • 2]] • (1 + 9)} • 7 + 1 (3.138) According to the rules we defined earlier, we first calculate all the elements that are in parenthesis (), that is to say: 8 + 2 = 10 (8 + 6) = 14 (1 + 9) = 10 (3.139) Which give us: {[5 • 10 + 3 • [4 + 14 • 2]] • 10} • 7 + 1 (3.140) Always according to the rules we defined earlier, now we calculate all the elements between brackets by always starting to calculate the terms that are in brackets [ ] at the lowest level of the other brackets []. Thus, we first calculate the expression [4 + 14 ■ 2] that is in the top-level bracket: [5 • 10 + 3 • ...]. This give us [4 + 14 • 2] = 32 and therefore: {[5 -10 + 3 -32] -10} -7 + 1 (3.141) info @ sciences. ch 187/5785 4. Arithmetic EAME v3. 5-2013 It remains to us to calculate now [5 ■ 10 + 3 ■ 32] = 146 and therefore: {146 -10} -7+1 We now calculate the single term in braces, which gives us: {146 ■ 10} = 1460 Finally it remains: 1460 • 7 + 1 = 10221 (3.142) (3.143) (3.144) Obviously this is a special case ... But the idea remains the same in general. The priority of arithmetic operators is a problem mainly related to computer languages (as we have already mentioned) because we can only write mathematical relation on a single line and this is many times as source of confusion for people not having technical skill. — a ■ (b + c) d f 9 e J will be written (pretty much on most computer languages): —a * (b + c) A d/e A f — g A non initiated could read this in many ways: — a ■ (b + c) d (b + c) ~ 9 \ (b + c)] c - a ; (3.145) (3.146) [-a - ( b + c)]ef - g (3.147) Thus it has logically be defined an order of prioritization of operators such that the operations are carried out in the following order: 1 . — Negation 2. " Power 3. * Multiplication and / division 4. \ Integer division (specific to computer science) 5. mod Module (see section Number Theory) 6. +, — Addition and subtraction Obviously the rules of parentheses (), brackets [], and braces {} that were defined in mathemat- ics apply also to computing. Thus we get in the order (we replace every transaction made with a symbol): 188/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic First the terms in parentheses: —a*(b + c) A d/e A f — g = —a * a A d/e A f — g (3.148) 1. First the negation (rule 1): —a*a A d/e A f — g = /3*a A d/e A f — g (3.149) 2. The power (rule 2): /3 *a A d/e A f - g = ft *x/8 - g (3.150) 3. We apply the multiplication (rule 3): P*x/$ ~ 9 = e/<5 - g (3.151) 4. And we apply division (rule 3 again): £ /§-g = (t)-g (3.152) The rules (4) and (5) does not apply to this particular example. 5. And Finally (rule 6): '<r-g = v (3.153) Thus, following these rules, neither a computer nor ahuman can (should) be wrong in interpret- ing an equation written on a single line. In computer code, however, there are several operators that we do not always find in pure mathematics and which order property frequently change depending from a computer language to another. We will not dwell too much on that stuff as it is almost without end, however, we have below a small description: • The concatenation operator "&" is evaluated before comparisons operators. • Comparison operators (= all have equal priority. However, the leftmost operator in an expression, hold a higher priority. The logical operators are evaluated in the following order of priority in most computing lan- guages: 1. Not (A) 2. And (A) 3. Or (V) info @ sciences. ch 189/5785 4. Arithmetic EAME v3. 5-2013 4. Xor (©) 5. Eqv («=>) 6. Imp (=^) Now that we have seen the operator priorities, what are the rules about signs applicable in mathematics and computing science? First, you must know that these latter rules only apply in the case of multiplication and division. Given two positive numbers (+x), (+y). We have: (+x) • (+y) = (+x) • (+?/) = + (x • y) (3.154) In other words, the multiplication of two positive numbers is a positive number and this can be generalized to the multiplication of n positive numbers. We have: (~x) ■ {+y) = (+x) ■ (-y) = -(x ■ y) (3.155) In other words, the multiplication of a positive number to a negative number is negative. Which can be generalized: to a positive result of a multiplication if there is an even number of negative numbers, and a negative result if there is an odd number of negative numbers on all n numbers included in the multiplication. We have: (-x) • (-y) = (lx) • (-y) = +(x • y) (3.156) In other words, multiplying two negative numbers is positive. What can be generalized: to a positive result of the multiplication if there is an even number of negative numbers and a negative result if there is an odd number of negative numbers. About divisions, the reasoning is the same: (+x) (■ +y ) and (■ +y ) (+x) (3.157) In other words, if the numerator and denominator are positive, then the result of the division will be positive. We have: (+x) _ (-x) (- y ) ( +y ) and (■ -y ) (+x) (+y) (-x) (3.158) In other words, if either the numerator or denominator is negative, then the result of the division will be necessarily negative. We have: (~s) (-y) and tv) (-x) (3.159) 190/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic In other words, if the numerator and denominator are positive, then the result of the division, will necessarily be positive. Obviously if we have a subtraction of terms, it is possible to rewrite it in the form: x — y = x + (— 1 )y = —1 • (—x + y) (3.160) info @ sciences. ch 191/5785 Number Theory T Raditionally, number theory is a branch of mathematics that deals with properties of integers, whether natural or whole integers. More generally, the field of study of this theory concerns a broad class of problems that naturally come from the study of integers. Number theory can be divided into several branches of study (algebraic number theory, computational number theory, etc.) depending on the methods used and the issues addressed. Remark The sign of the cross " x " for multiplication is said to be for the first time in the book of Oughtred (1631), about the halfway point (modern notation for multiplication), we ought it to Leibniz. From 1544, Stiefel, in one of his books did not employ any sign and designated the product of two numbers by placing them next to each other. V / We chose to introduce in this section only the subjects that are essential to the study of mathe- matics and theoretical physics of this book as well as those to be absolutely part of the general culture of the engineer (some results have application in Biostatistics!). 4.1 Principle of good order We will take for granted the principle that says that every nonempty set S' C N contains a smaller element. We can use this theorem to prove an important property of numbers named " Archimedean property" or "Archimedes’ axiom" which states: For Va, b <6 N where a is non-zero, there is at least one positive integer n such that: n ■ a ^ b (4.1) In other words, for two unequal values, there is always an integer multiple of the smallest, bigger than the larger one. We name "Archimedean" structures whose elements satisfy a comparison property (see section Set Theory). While this is trivial to understand in the case of integers let us prove it because it allows us to see the type of approaches used by mathematicians when they must prove trivial items like this... Proof 4.10.2. Let us suppose the opposite by saying that for Vn 6 N we have: n ■ a <b (4.2) If we can prove that it is absurd for any n then we will have prove the Archimedean property (and also if a , b are real). EAME v3. 5-2013 4. Arithmetic Let us consider then the set: S = {b — na\n G N} (4.3) ETsing the principle of good order, we deduce that there exist so G S such as so < s for all s G S. Let us write that this smaller element is: and therefore we also have: Sq — b — i~iq cl b — (no + l)a G S As by hypothesis na <b then we must have: b — (n 0 + l)a > b — n 0 a and if we reorganize and simplify: -(no + 1 ) > -no and that we simplify the negative sign we had to get...: n 0 + 1 > n 0 (4.4) (4.5) (4.6) (4.7) (4.8) an obvious contradiction! This contradiction leads that the initial assumption as na < b for all n then is false and therefore the Archimedean property is proved by the absurd. □ Q.E.D. 4.2 Induction Principle Let S' be a set of natural numbers that has the following two properties: PI. 1 G S' P2. If k G S, then k + 1 G S then: S = N \ {0} = N* (4.9) We are build like this the set of natural numbers (refer to the section Set Theory to see the rigorous construction of the set of naturla number with the Zermelo-Lraenkel axioms). Theorem 4.11. Given now: B = N*\S (4.10) the symbol "\" meaning for recall "excluding". We want to prove that: B = 0 (4.11) info @ sciences. ch 193/5785 4. Arithmetic EAME v3. 5-2013 Again, even if it is trivial to understand, let us do the proof because it allows us to see the type of approaches used by mathematicians when they must prove trivial stuff like this... Proof 4.11.1. Let us suppose the opposite, that is to say: B ± 0 (4.12) By the principle of good order, since B C N, B must have a smallest element which we will denote by b 0 . But since 1 G S' by the property (PI), we have that b 0 > 1 and of course also that 1 G B, that is to say also b 0 — 1 G S. By using the property (P2), we finally have that b 0 G S, that is to say that b 0 ^ B, therefore we get a contradiction. □ Q.E.D. ^Example: We want to show thanks to the induction principle, that the sum of the first n square equals n(n + 1) (2 n + 1) /6, that is to say for n > 1 , we would have to (see section Sequences and Series): l + 2 + ... + n 2 = J2f i=l n(n + 1)(2 n + 1) 6 (4.13) First the above relation is easily verified for n — 1 we will show that n — k + 1 also verifies that relation. Under the induction hypothesis: l 2 + 2 2 + I 2 / 7 1 \2 V'' -2 k{k + 1) (2k + 1) , , N 2 + k 2 + (k + lf = y« 2 = — ^ + (k + l) 2 i-i 6 (4.14) (k + l)(k + 2)(2k + 3) 6 although we fall back on the assumption of the validity of the first relation but with n — k + 1, hence the result. This prove process is therefore of great importance in the study of arithmetic. Often observation and induction have led to a suspicion of laws it would have been more difficult to find by a priori. We realize the accuracy of formulas by the previous method that gave birth to modem algebra by Fermat and Pascal studies on the Pascal’s triangle (see section Calculus). 194/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 4.3 Divisibility Definition (#51): Given A, B e Z with A ^ 0. We say that " A divides B (without rest)" if there is an integer q (the quotient) such that: B = Aq (4.15) in which case we write to differentiate of the class division: Otherwise, we write and we say that " A does not divide B". A\B A\B (4.16) (4.17) Moreover, if A | B, we also say that ”B can be divided by A" or "B is a multiple of A". In case where A\B and that 1 > A < B, we will say that A is a "proper divisor" of B. Moreover, it is clear that A|0 regardless of A 6 Z \ {0} otherwise what we have a singularity. Here are some basic theorems relating to the division: Theorem 4.12. If A\B, then A\BC whatever C e Z. Or more formally: VC e Z : A\B=> A\BC (4.18) Proof 4.12.1. If A\B, the it exists an integer q such that: B = Aq (4.19) Then: BC = ( Aq)C = A(qC) (4.20) and therefore: A\BC (4.21) □ Q.E.D. info @ sciences. ch 195/5785 4. Arithmetic EAME v3. 5-2013 Theorem 4.13. If A\B and B\C, then A\C or more formally: A\B A A\B => A\C (4.22) Proof 4.13.1. If A\B and B C then, there exists two integers q and r such that B C = Br. More formally: = Aq and A \B A B\C — > 3 (q, r) E N : B = Aq A C = Br (4.23) Therefore: C = A(qr) (4.24) and hence: C = A\C (4.25) □ Q.E.D. (4.26) = Aq and (4.27) (4.28) □ Q.E.D. (4.29) = Aq and We then have: Theorem 4.14. If A\B and A\C then: A\(Bx + Cy) Vx,y£Z Proof 4.14.1. If A\B and A\C then, there exists two integers q and r such that B C — Ar. It follows: Bx + Cy = ( Aq)x + ( Ar)y = A(qx + ry ) and therefore: A\(Bx + Cy) Vx,y 6 Z Theorem 4.15. If A\B and B\A then: A = ±B Proof 4.15.1. If A\B and B\A then, there exists two integers q and r such that B A = Br. B = B(qr) (4.30) and thus qr = 1. This is why we can have q — ±1 if r = ±1 and thus: A = ±B (4.31) Theorem 4.16. If A\B and B 7 ^= the: ki|<|B| Proof 4.16.1. If A\B then there exist an integer q 7 ^ 0 such that B \B\ = \A\\q\ > \A\ as |t/| > 1 . □ Q.E.D. (4.32) Aq. But then: (4.33) □ Q.E.D. 196/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 4.3.1 Euclidean Division The Euclidean division is an operation that, to two integers named respectively the "dividend and "divisor" combines two other integers named the "quotient" and "remainder". Initially define only for nonzero integers, it can be generalized to relative integers and polynomials, for example. Definition (#52): We name "euclidean division" or "integer division" of two numbers A and B the operation of dividing B by A, stopping when the rest is strictly less than A. Let us recall (see section Numbers) that any number which admits exactly two euclidean divi- sors (such that division gives no remainder) that are the 1 and itself is named a "prime number" (which excludes the number 1 of the list of primes) and that any pair of numbers which have only 1 as common Euclidean divider are say to be "relatively prime", "mutually prime", or "coprime". Theorem 4.17. Given A, B e Z with A > 0. The "theorem of the Euclidean division" state that there are unique integers q (quotient) and r (remainder) such as: B = Aq + r^r = B — Aq (4.34) where 0 > r < A. Furthermore, if A \ B, then 0 < r < A. ^Example: One cake with 9 parts ( B ), we then have to divide it between 4 people (A) with one part remaining (r=l) such that q= 2 Figure 4.22 - The pie has 9 slices, so each of the 4 people receive 2 slices and 1 is left over. and therefore: 10 = 2 • 4 + 1 (4.35) Proof 4.17.1. Let us consider the set: S = {r = B — qA\q, B <G Z, A G Z*, B — qA > 0} (4.36) info @ sciences. ch 197/5785 4. Arithmetic EAME v3. 5-2013 It is relatively easy to see that S C N* { 0 } and that S ^ 0, hence, according to the principle of good order, we conclude that S contains a smaller element r > 0. Given q the integer satisfying thus: r = B — Aq (4.37) We want to first show that r < A assuming the opposite (proof ad absurdum), that is to say that r ^ A. So, in this case, we have: B — qA = r > A (4.38) which is equivalent to: B — (q + 1 )A = r — A > 0 (4.39) but B — (q + 1 )A E S and: B — (q + 1 )A < B — qA (4.40) This contradicts the fact that: r = B — qA (4.41) is the smallest element of S'. So r < A. Finally, it is clear that if r second statement of the theorem. = 0, we have A\B, hence the □ Q.E.D. Remark In the statement of the Euclidean division, we assumed that A > 0. What do we get when A < 0? In this situation, —A is obviously positive, and then we can apply the Euclidean division to B and —A. Therefore, there are integers q and r integers such that: B = q(—A) + r (4.42) where 0 > r < \A . But this relation can be written: B = -q{A) + R (4.43) where obviously, — q is an integer. The conclusion is that the Euclidean division can be stated in a more general form. Given 6 Z, there exist two integers q and r such that: B = Aq + r (4.44) where 0 > r < \A . Furthermore, if A\ B, then 0 < r < /I V J The integers q and r are unique in the Euclidean division. Indeed, if there are two other integers q' and r' such as: B = Aq' + r' (4.45) 198/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic always with 0 < r' < A, then: A(q' — q) — r — r‘ (4.46) and therefore: A\(r — r') (4.47) Following theorem 4.13 we have if r — r' 7 ^ 0 that \r — r'\ > A. But, this last inequality is impossible as by construction —A < r — r ' . Therefore r = r' and, as A 7 ^ 0, then q' = q hence the unicity. 4.3. 1.1 Greatest common divisor The greatest common divisor (gcd) (also known as greatest common factor (gcf), high- est common factor (hcf), greatest common measure (gem), or highest common divisor) of two or more integers, when at least one of them is not zero, is the largest positive integer that divides the numbers without a remainder. Definition (#53): Given a,b G Z such as ab 7 ^ 0. The "greatest common divisor" (gc) of a and b, denoted: (a,b) (4.48) is the positive integer n that satisfies the following two properties: PI. d\a and d\b (so without remainder r in the division!) P2. If c\a and c\b the c < d and c\d (by division!) Note that 1 is always a common divisor of two arbitrary integers. info @ sciences. ch 199/5785 4. Arithmetic EAME v3. 5-2013 ^Example: Let us consider the positive integers 36 and 54. A common divisor of 36 and 54 is a positive integer that divides 36, and also 54. For example, 1 and 2 are common divisors 36 and 54. Div 36 = {1, 2, 3, 4, 6, 9, 12, 18, 36} Div 54 = {1,2,3,6,9,18,27,54} We have the intersection represented by the following Venn diagram: (4.49) with the following set of common divisors: However it is not necessarily obvious that the greatest common divisor other than 1 (that is to say different of 1) of two integers a and b that are not relatively prime always exists. This is proved by the following theorem (however, if the gcd exists, it is by definition unique!) named "Bezout theorem" that can also gives the opportunity to prove other interesting properties of two numbers as we shall see later . Theorem 4.18. Given a, b e Zsuch that ab / 0. If d divides a and d divides b (for both without remainder r !) then there must two integers x and y such that: d = (a, b) = ax + by v y (4.50) This relation is named the "Bezout identity" and it is a linear Diophantine equation (see section Calculus). Proof 4.18.1. Obviously, if a and b are relatively prime we know that d is then 1. To prove the Bezout identity let first consider the set: S = {d = ax + by\x, y G Z, ax + by > 0} (4.51) As S' C N and S =4 0, we can use the principle of good order and conclude that S has a smaller element d. We can then write: d = ax 0 + by 0 (4.52) 200/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic for some given choice x 0 , y o G Z. So it is sufficient to prove that d = (a, b ) to prove the Bezout identity ! Let us proceed with a proof by contradiction by assuming d \ a. Then if this is the case, following the Euclidean division, there exist q, r G Z such as a = qd + r, where 0 < r < d. But then: r = a — qd = a — q(ax 0 + by 0 ) = a(l — qx 0 ) + b(—qy 0 ) (4.53) Thus we have that r G S' and r < d, which contradicts the fact that d is the smallest possible element of S. Thus we have proven not only that d\a, but also that d always exists and, in the same way we prove that d\b. □ Q.E.D. Corollary 4.18.1. As important corollary let us now prove that if a, b G Z such that ab ^ 0, then: S = {ax + by\x, y G Z} (4.54) is the set of all multiples of d(a, b). Proof 4.18.2. As d\a and d\b, then we have necessarily dax + by | for any x. y G Z. Either M = {nd\n G Z}. Our problem is then reduced to prove the fact that S = M. Given first s G S which means that d\s and involves s G M. Given a m G M, this would mean that m = nd for a certain n G Z. As d = ax 0 + Injo for any choice of integers x 0 , y {) G Z, then: m = nd = n(ax 0 + by 0 ) = a(nx 0 ) + b(ny 0 ) G S (4.55) □ Q.E.D. The assumptions may seem complicated but put your attention a given time on the last equality. You will quickly understand! Remark If instead of defining the greatest common divisor of two non-zero integers, we allow one of them to be equal to 0, say: a ^ b, b = 0. In this case, we have a\b and, according to our definition of the GCD, it is clear that (a, 0) = |a|. V Given d = (a, b) and m G Z, then we have the following properties of the GCD (without proof but if a reader request them we will give the details): PI. (a, b + ma ) = (a, b) = (a, —b) info @ sciences. ch 201/5785 4. Arithmetic EAME v3. 5-2013 P2. (am, bm ) = |m|(a, b ) where m/ 0 P4. If g e Z \ {0} such that da and g\b then = 7 ^- (a, b) \9 9/ \g\ In some books, these four properties are proved using intrinsically the property itself. Personally we abstain make usage of this approach because doing this is more ridiculous than anything else as the statement of the property is a proof in itself. Let us now develop a method (algorithm) that will be very useful to us to calculate (determine) the greatest common divisor of two integers (sometimes useful computing science). 4.3.2 Euclidean Algorithm The Euclidean algorithm is an algorithm for determining the greatest common divisor of two integers (we have hesitate to put this subject in the section of Theoretical Computing...). To address this method intuitively, you must know that that you need to see that an integer as a length, a pair of integers as a rectangle (sides) and their GCD is the size of the largest square for tile (paving) their rectangle by definition (yes if you think for a moment it’s quite logical!). The algorithm decomposes the original rectangle into squares, always smaller and smaller, by successive Euclidean division of the length by the width, then the width by the remainder until a zero remainder. We must understand this geometric approach to then understand the algorithm. ^Example: Let us consider that we seek the GCD of (a, b) where b is equal 21 and a is equal 15 and keep in mind that the GCD, besides the fact that it divides a and b, must leave a zero remainder! In other words it must divide the remainder of the division of b by a also! So we have the following rectangle of 21 by 15: Figure 4.24 - First step of the GCD algorithm 202/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic First we see if 15 is the GCD (it always starts with the smallest). We then divide 21 by 15, which is equivalent geometrically to: Figure 4.25 - Second step of the GCD algorithm 15 is therefore not the GCD (we suspected it...). We immediately see that we can not pave the rectangle with a square of 15 by 15. So we have a remainder of 6 (left rectangle). The GCD as we know must, if it exists, by definition divide that remains and leave a zero remainder. So we have a rectangle of 15 by 6. So we are looking now to pave this new rectangle because we know that the greatest common divisor is by construction less than or equal to 6. Then we have: Figure 4.26 - Third step of the GCD algorithm So we divide 15 by remainder 6 (this result will be less than 6 and immediately permits to tests whether the reamainder will be the GCD). We are getting: Figure 4.27 - Fourth step of the GCD algorithm Again, we can not pave the rectangle only with squares. In other words, we have a non-zero remainder which is 3. Given now a rectangle of 6 by 3. So we are looking now to pave the new rectangle because we know that the greatest common divisor is by construction less than or equal to 3 and that it will leave a remainder equal to zero, if it exists. We then have geometrically: info @ sciences. ch 203/5785 4. Arithmetic EAME v3. 5-2013 Figure 4.28 - Fifth step of the GCD algorithm We divide 6 by 3 (which will be less than 3 and permits us to test immediately whether the rest will be the GCD): Figure 4.29 - Sixth step of the GCD algorithm and it’s all good! We then have 3 that leave us with a remainder equal to zero and divides the remainder 6 so this is the GCD. So we have in the end: Figure 4.30 - Summary of the GCD algorithm Now let us see the equivalent formal approach. Given a,b E Z, where a > 0. Applying successively the Euclidean division (with b > a), we get the following sequence of equations: b = = qai + n 0 < r\ < a a = = nq 2 + r 2 0 < r 2 < T'l n = = f\2q :i + r 3 0 < r 3 < r 2 r i~ 2 = - G-i Qj + r i 0 < r i < r i- G-i = - rjqj+i (4.56) 204/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic if d = (a, b ), then d — rj. with the corresponding pseudo-code algorithm: Algorithm 1: GCD pseudo-code algorithm Data: a,b Result: b 1 initialization; 2 r = a mod b\ 3 while r ^ 0 do 4 I a b\ 5 b : = r; 6 f(b) ■= /(a); 7 a := xi; 8 | /(«) : = f{x i); 9 end to Display xi; Otherwise even more formally: Proof 4.18.3. We want first prove that r 3 = (a, b). But, following the property PI: (a, b + ma ) = (a, 6) = (a, —6) (4.57) we have: (a, b ) = (a, n ) = (K, 7-2 ) = . . . = (ry-i, ry) (4.58) To prove the second property of the Euclide’s algorithm, we write the prior-previous equation of the system under the form: rj = r j_2 - qjTj-i (4.59) Now, using the previous equation this prior-previous equation of the system, we have: fj = 0-2 - o(o- 3 - 0-i0- 2 ) = (1 + OO-i )' 0-2 + (- 0 ) 0-3 (4.60) Continuing this process, we can express o as a linear combination of a and b. □ Q.E.D. info @ sciences. ch 205/5785 4. Arithmetic EAME v3. 5-2013 ^Example: Let us calculate the greatest common divisor of (429, 966) and express this num- ber as a linear combination of 429 and 966. 966 = 429 • 2 + 108 429 = 108 • 3 + 105 (4.61) 108 = 105 • 1 + 3 105 = 35-3 We therefore conclude that: r j = d= (966, 429) = 3 (4.62) and, in addition, that: 3 = 108 - 105 • 1 = 108 - (429 - 108 • 3) = 108 • 4 - 429 = (966 - 429 ■ 2) • 4 - 429 (4.63) = 966 • 4 - 429 • 9 = 966 • 4 + 429 • (-9) Thus the GCD is indeed expressed as a linear combination of a and b and constitutes as such the GCD. Definition (#54): We say that the integers ai, a 2 , • • . , a n are for recall "relatively prime" if: (ai,a 2 , • • • ,a n ) = 1 (4.64) 4.3.3 Least Common Multiple The least common multiple (also named the "lowest common multiple" or "smallest common multiple") of two integers a and b, usually denoted by LCM(a, b), is the smallest positive integer that is divisible by both a and b. Since division of integers by zero is undefined, this definition has meaning only if a and b are both different from zero. The LCM is familiar from grade-school arithmetic as the "lowest common denominator LCD " (also named "smallest common denominator") that must be determined before fractions can be added, subtracted or compared. The LCM of more than two integers is also well-defined: it is the smallest positive integer that is divisible by each of them. Definitions (#55): Dl. Given a\, a 2 , • • • , a n G Z \ {0}, we say that mis a "common multiple" of ai, a 2 , . . . , a n if ai\m for i — 1, 2, . . . , n D2. Given ai,a 2 ,...,a n G Z \ {0}, we name "lowest common multiple LCM" of ai, a 2 , • • • , a n if ai\m for i = 1,2, ... ,n denoted: [ai,a 2 , . . . ,a n ] (4.65) 206/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic the lowest integer positive common multiple to all common multiples of ai, a 2 , . . . , a n . ^Examples: El. Let us consider the positive integers 3 and 5. A common multiple of 3 and 5 is a positive integer which is both a multiple of 3, and a multiple of 5. In other words, which is divisible by 3 and 5. We have therefore: M 3 = {3, 6, 9, 12, 15, 18, 21, 24, 27, 30, . . .} M 5 = {5, 10, 15, 20, 25, 30, 35, 40, 45, . . .} We then have the intersection represented by the following Venn diagram: with then have following set of common multiples: M 3 n M 5 = {15,30,45,60,...} (4.67) and therefore the LCM is given by: LCM = min{15, 30, 45, 60, . . .} = 15 (4.68) Or if it can help here is another possible visualization of the concept: Least Common Multiple: LCM We see obviously that all the common multiples of 3 and 5 is the set of multiples of 15. info @ sciences. ch 207/5785 4. Arithmetic EAME v3. 5-2013 Remark Given oi, a 2 , . . . , a n G Z \ {0}. Then the least common multiple exists. Indeed, consider the set E of natural integers m that for all i divide a*. What we will write: E = {m\ai\m 6 N, % — 1, 2, . . . , n} (4.69) Since we have necessarily |aia 2 . . . a„ e E, then the set is not empty and, according to the axiom of good order, the set E contains a smaller positive element. V / Let us now see some theorems related to the LCM: Theorem 4.19. If m is any common multiple of a\, a 2 , . . . , a n then [ai, a 2 , . . . , a n ] \m that is to say that m divides each of the a,. Proof 4.19.1. Given M = [ai, a 2 , . . . , a n \. Then, by the Euclidean division, there are integers q and r such that: m = qM + r 0 < r < M (4.70) It suffices to show that r = 0. Let us suppose that r ^0 (reductio ad absurdum). Since a, | rri and cii\M, the we have aj|r and this for i — 1, 2, . . . , n. So r is common multiple of a±, a 2 , . . . , a n of the smallest than the LCM. We just obtained a contradiction, which proves the theorem. □ Q.E.D. 208/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Theorem 4.20. If k >, then [ka\, ka 2 , . . . , ka n ] = k[ai, a 2 , . . . , a n ] The proof will be assumed obvious (if not as always contact us and will add the details!) Theorem 4.21. [a, b] • ( a,b ) = \ab\ Proof 4.21.1. Lemma 4.21.1. For this proof, we will use the "Euclid’s lemma" that says that if a\bc and (a, b) = 1 then a|c. In other words Euclid’s lemma captures a fundamental property of prime numbers, namely: If a prime divides the product of two numbers, it must divide at least one of those numbers. It is also named "Euclid’s first theorem". This lemma is the key of the proof of the fundamental theorem of arithmetic that we will see just further below. Indeed, this can be easily verified because we have seen that there exists x,y G Z such as 1 = ax + by and then c = acx + bey. But a|ac and a\bc imply that a\(acx + bey), that is to say also that a\c. Ok let us now return to our theorem: Since (a, b) = (a, —b) and [a, b\ = [a, — b ], it suffices to prove the result for positive integers a and b. First of all, let consider the case where (a, b) = 1. The integer [a, b] being a multiple of a, we can write [a, b] = ma. Thus, we have b\ma and since (a, b) = 01, it follows, by Euclid’s lemma, that b\m. Therefore, b < m and then ab < am. But ab is a common multiple of a and b that can not be smaller than the LCM. therefore ab = ma = [a, b\. For the general case, that is to say (a, b) — d > 1, we have, according to the property: (4.71) and with the result obtained previously that: (4.72) When we multiply both sides of the equation by cl 2 , the result follows and the proof is done. □ Q.E.D. info @ sciences. ch 209/5785 4. Arithmetic EAME v3. 5-2013 4.3.4 Fundamental Theorem of Arithmetic The fundamental theorem of arithmetic says that every natural number n > 1 can be written as a product of primes, and this representation is unique, except for the order in which the prime factors are arranged. The theorem establishes the importance of prime numbers. Essentially, they are the building blocks of building positive integers, each positive integer containing primes in a unique way. Remark This theorem is sometimes named "factorization theorem" (wrongly ... because some other theorems have the same name ...). V ' So let’s go: Theorem 4.22. Every integer greater than 1 either is prime itself or is the product of prime numbers, and that this product is unique, up to the order of the factors. Remark This theorem is one of the main reasons why 1 is not considered a prime number: if 1 were prime, the factorization would not be unique. Proof 4.22.1. The proof uses Euclid’s lemma: if a prime p divides the product of two natural numbers a and b, then either p divides a or p divides b (or both). If n is prime, and therefore product of a unique prime integer, namely itself, the result is true and the proof is complete (say that a prime number is product of itself is obviously a misnomer! ). Suppose that n is not prime and therefore strictly greater than 1 and consider the set: D = {d\n and 1 < d < n} (4.73) So, D C N and since n is composite, we have that D ^ 0. According to the principle of good order, D has a smaller element p\ that is prime, otherwise the minimum choice of p\ is contradicted. We can the write n = p \ n i . If n\ is prime, then the proof is complete. If n\ is also composite, then we repeat the same argument as before and we deduce the existence of a prime number p 2 and of an integer n 2 < ni, such as n — pip 2 n 2 . By continuing we come inevitably to the conclusion that n k will be prime. So finally we well show that any number can be decomposed into prime numbers factors with the principle of good order. □ Q.E.D. We do not know to this day a simple law that allows to calculate the n-th prime factor p n . Thus, to know if an integer m is a prime, it is almost easier at this date to verify its presence in a table of prime numbers. 210/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic In fact, we use nowadays the following method: Given an integer m, if we want to determine whether it is prime or not, we calculate if it is divisible by the primes number p n belonging to the set: {Pn e N|p n < Vm} (4.74) ^Examples: The integer 223 is neither divisible by 2 or by 3 or by 5 or by 7, 13. It is useless to continue with the next prime number, because 17 2 = conclude therefore that the number 223 is prime. or by 11, or by = 289 > 223. We 4.3.5 Congruences (modular arithmetic) Modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" upon reaching a certain value — the modulus (plural moduli). A familiar use of modular arithmetic is in the 12-hour clock (and also the calendar), in which the day is divided into two 12-hour periods. If the time is 7:00 now, then 8 hours later it will be 3:00. Usual addition would suggest that the later time should be 7 + 8 = 15, but this is not the answer because clock time "wraps around" every 12 hours; in 12-hour time, there is no "15 o’clock". Likewise, if the clock starts at 12:00 (noon) and 21 hours elapse, then the time will be 9 : 00 the next day, rather than 33:00. Because the hour number starts over after it reaches 12, this is arithmetic modulo 12. According to the definition below, 12 is congruent not only to 12 itself, but also to 0, so the time naed "12:00" could also be called "0:00", since 12 is congruent to 0 modulo 12. Figure 4.31 - Time-keeping on this clock uses arithmetic modulo 12 (source: Wikipedia) Definition (#56): Let m e Z \ 0. If a and b have the same remainder when divided by m in the Euclidean division then we say "a is congruent to b modulo m", and we write: a = b mod (m) (4.75) info @ sciences. ch 211/5785 4. Arithmetic EAME v3. 5-2013 or equivalently there are (at least) on relative integer k such that: a = b + km \ (4.76) We also name the number b "residue". Thus, a residue is an integer congruent to another, modulo a given integer m. The reader can verify that this requires that: m\(a — b) (4.77) Remarks Rl. The reader must well understand that congruence implies a null remainder for the division! R2. We exclude in addition to the 0 also the 1 and — lfor the possible values of m in the definition of congruence in some books. R3. Behind the term congruence are hidden similar concepts of different levels of ab- straction: • In modular arithmetic, so we say that "two integers a and b are congruent modulo m if they have the same remaining in the Euclidean division by m". We can also say that they are congruent modulo m if their difference is a multiple of m. • In the study of oriented angles, we say that "two measurements are congruent mod- ulo 27 r [rad] if and only if their difference is a multiple of 27 t [rad]". This charac- terize two measures of the same angle (see section Trigonometry). • In algebra, we speak of congruence modulo / in a commutative ring (see section Set Theory) for which I is an ideal: "x is congruent to y modulo / if and only if their difference belongs to This congruence is an equivalence relation compat- ible with the operations of addition and multiplication and gives the possibility to define a quotient ring of the parent set with its ideal I. • We sometimes see in the study of geometry (see section Euclidean Geometry) the term "congruence" used in place of "similar". It is then a simple equivalence rela- tion on the set of plane figures. V I J The relation of congruence = is an equivalence relation (see section Operators), in other words, given a, 6, c, m <G Z, m > 1 then the congruence relation is: PI. Reflexive: P2. Symmetric: P3. Transitive: a = a mod (m) a = b mod (m) b = b mod (m) a = b mod ( m),b = c mod (m) =>■ a = b mod (m) (4.78) (4.79) (4.80) 212/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic The properties PI and P 2 are obvious (if this is not the case please let us know we will de- velop!). We will prove only P3. Proof 4.22.2. The assumptions imply that b = a + km c = b + Im (4.81) But then: c = b + Im = (a + km) + Im — a + (k + l)m (4.82) This prove that a and c are congruent modulo m. □ Q.E.D. The relation of congruence = is compatible with the sum and the product (remember that power is ultimately an extension of the product!). Indeed, given (a, b, a', b ' , m) £Z,m>l such that a = mod (m) and a' = b' then: PI. a + a' = b + b' mod (m) P2. aa' = bb 1 mod (m) Proof 4.22.3. We have: a = b + km a' — V + Im (4.83) by hypothesis. But then: a + a' = b + b' + (Z + k)m (4.84) which proves PI. We also have: aa' = bb' + blm + b'km + klm'2 = bb' + sm (4.85) which proves P2. □ Q.E.D. Remark The congruence relation behaves in many point like the relation of equality. However a property of the relation of equality = is not true for that of congruence =, namely the simplification: If ab = mod (m), we do not have necessarily b = c mod (m). \ / ^Examples: 2 • 1 = 2 • 3 mod (4) but 1^3 mod (4) (4.86) So far we have seen the properties of congruences involving a single modulus. We will now study the behavior of the congruence relation on a change of modulus. info @ sciences. ch 213/5785 4. Arithmetic EAME v3. 5-2013 PI. If a = b mod (m) and d\m, then a = b mod (d) P2. If a = b and a = b mod (s) then a and b are congruate modulus [r, s] We think this two properties are obvious. We do not need to go into details for PI. For P2, since b — a is a multiple of r and s since by hypothesis: = k, - = / =>■ b — a = rk = si (4.87) r s b — a is then a multiple of the LCM of r and s, which proves P2. From these properties it follows that if we denote by f(x) a polynomial with integer coefficients (positive or negative): f(x) = Ax 11 + Bx n " 1 + Cx n ~ 2 + ... + Kx + L (4.88) The congruence a = b mod (m) will also give f(a ) = f(b) mod (m). If we replace x successively by all integers in a polynomial f(x) with integer coefficients, and if we take the remaining modulus m, these remaining are reproduced from m to m (in the sense where the congruence is satisfied), since we have, regardless of the number m and x: f(x) = f(x + m ) mod (m) (4.89) We then deduce then the impossibility of solving the following congruence: f{x) = r mod (m) (4.90) with integer numbers, if r is anyone of the "non-remaining" (a residue that does not satisfy the congruence). 4.3.5. 1 Congruence Class Definition (#57): We name "modulo m congruence class", the subset of the set Z de- fined by the property that two elements a and b of Zare in the same class if and only if a = b mod (m) or that a set of elements are congruent by this same modulo. Remark We saw in the section Operators that this is in fact an equivalence class as the congruence modulo m is, as we have proved above, a relation of equivalence! ! ! V ) 214/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic ^Example: Given m = 3. We divide the set of integers into congruence classes of modulo 3. Here are for example three sets whose elements are congruent rest (see well what gives the set of all these classes together!): with one another without .,-9,-6 -3,0,3,6,9,12,...} (4.91) {... ,-8,-5, -2, 1,4, 7, 10,13,...} (4.92) {• ..,-7, -4, -1,2, 5, 8, 11,...} (4.93) Thus we see that for each pair of elements of a congruence class, the congruence modulo 3 exists. However, we see that we can not take that —9 = —8 mod (3) where —9 is in the first class and —8 in the second. The smallest non-negative number of the first class is 0, this of the second is 1 and the last is 2. Thus, we will denote these three classes respectively [0] 3 , [1] 3 , [2] 3 , the number 3 in the index indicating the modulus. It is interesting to notice that if we take any number of the first class and any number of the second class, then their sum is always in the second class. This can be generalized and allows to define a sum of classes modulo 3 by writing: [0] 3 + [0] 3 = [0] 3 ; |0]3 + [1] 3 - [1]3 [0] 3 + [2] 3 = [2] 3 ; [l]s + [l]s = [2] 3 (4.94) [1] 3 + [2] 3 = [0] 3 ; [2] 3 + [2] 3 = [1]3 [0] 3 x [0] 3 = [0] 3 ; [0] s X [l]s = [0]s [0] 3 x [2] 3 = [0] 3 ; [1]3 x [1]3 — [IJ3 (4.95) [l]s X [2] 3 = [2] 3 ; [2] 3 X [2] 3 = [1] 3 Thus, for any m > 1, the congruence class: a mod (m) (4.96) is the set of integers congruent to a modulo m (and congruent modulo m between them)! ! ! This class is denoted by: := a mod (m) (4.97) Remark Having bracketed the "and congruent modulo m between them" is due to the fact that the congruence, being an equivalence relation we have as we have proved above that b = a mod (m), c = mod (m), then b = a mod (m). V info @ sciences. ch 215/5785 4. Arithmetic EAME v3. 5-2013 Definition (#58): The set of congruence classes [a] m (that forms by the fact that congruence is an equivalence relation: "equivalence classes"), for a fixed m gives what we name a "quotient set" (see section Operators). More rigorously, we speak of the "quotient set of Z by the con- gruence relation" whose elements are the congruence classes (or: equivalence classes) and then form the ring Z/mZ. We deduce from the definition the following two trivial properties: 1. The number b is in the class [a] m if and only if a = b mod (m) 2. The classes [a] m and [b] m are equal if and only if a = b mod (m) Theorem 4.23. There are exactly m different congruence classes of modulo m, ie [Ojmj [l]m; ■ ■ ■ j [m l]m- Proof 4.23.1. Given m > 1, than any integer a is congruent modulo m to one and only one integer r of the set {0, 1, 2, . . . , m — 1} (notice well, it is important, that we restrict ourselves to the positive integers without taking into account the negative one!) . In addition, this integer r is exactly the remaining of the division of a by m. In other words, if 0 < r < m, then: a = r mod (m) (4.98) if and only if a = qm + r where x is the quotientof a by m and r is the remainder. The proof is an immediate consequence of the definition of the congruence and of the Euclidean division. □ Q.E.D. Definition (#59): An integer b in a congruence class modulo m is named a "representative of this class" (it is clear that by the equivalence relation that two representative of the same class are congruent modulo m to each other). We can now be able to build an addition and a multiplication on the congruence classes. To define the sum of two classes [a] m , [b\ m , it suffices to take one representative from each class, to their sum and take the congruence class of the result. Thus (see examples above ): [a)m + [b\m — [ a + b\m (4.99) And same for the multiplication: [a]m ■ [b\m = [a ■ b\ m (4.100) By construction of the addition and multiplication, we see that 0 (zero) is the neutral element for addition: [a\m + [0] m = [a\ m , Va e Z, Vm G N (4.101) and the class of the integer 1 is the neutral element for multiplication: [a]m ■ [l]m = [a]m, Va G Z, Vm G N (4.102) Definition (#60): An element [a] m of Z/mZ is "one unit" if there is an element [b] m G Z/mZ such that [a] m ■ [b] m . The following theorem helps to characterize classes modulo m which are units in Z/m/Z: 216/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Theorem 4.24. Given [a] un element of Z/m/Z. Then [a] is a unit if and only if (a, m) = 1. Proof 4.24.1. Suppose first that (a, m) = 1. Then by Bezout theorem, we have its identity: as + mr = 1 (4.103) In other words, as is congruent to 1 modulo m. But this is equivalent to write by definition that [a] [s] = 1 showing that [a] is a unit. Conversely, if [a] is a unit, this implies that there exists a class [s] such that [a][s] = 1. Thus, we have just proved that Z/Z is indeed a ring since it has an addition, a multiplication, a neutral element and an inverse! ! □ Q.E.D. 4.3.5.2 Complete set of residues Definition (#61): A set of numbers do, ai, ..., a<in — 1) mod (m) form a "complete set of residues", also named a "covering system", if they satisfy ai = i mod (m) for i — 0, 1, ..., m — 1. This type of systems will help us to introduce in the section of Cryptography to introduce an important function used in secured communication devices at the end of the 20th century and beginning of the 21st century. To introduce this concept, consider the following finite system of congruences modulo 6: 6 = 0 mod (6) 13 = 1 mod (6) 2 = 2 mod (6) -3 = 3 mod (6) 22 = 4 mod (6) 11 = 5 mod (6) where as the reader will have probably noticed it: no residue is repeated in the list and no residue taken in pairs are congruent between them modulo m (is this last point that oblige to stop at 5 in our example). We then say that the residues are "mutually incongruent". If these conditions are met, then we say that the ordered set (6, 13, 2, —3, 22, 11} is a "complete system of residues modulo m" as already defined. Such a set is not unique for a given module. Thus, the set (0, 1, 2, 3, 4, 5} is also a complete (trivial) system of residues modulo 6. If we eliminate from this entire system all numbers that are not prime to m, then we have a "system of reduced residue modulo m". So in the above example, the reduced residue system modulo 6 will be {13, 11}. Reduces systems will be useful tou us in the section Cryptography to prove an important result in the asymmetric public key systems. info @ sciences. ch 217/5785 4. Arithmetic EAME v3. 5-2013 We will see also in the section Cryptography, the "Euler indicator function" when m is prime (which is not the case in the previous example) gives the cardinal of the reduced system modulus m as being equal to: 0(m) = m — 1 = Card ({n < in, (■ n , m) = 1}) (4.105) So under the assumption condition that m is prime, the reduced system of residue is obviously written: {ri,r 2 ,...,r 0 (m)} (4.106) 4.3.5.3 Chinese remainder theorem In its basic form, the Chinese remainder theorem will determine a number n that, when divided by some given divisors, leaves given remainders. In Sun Tzu’s example (stated in modern terminology), what is the smallest number n that when divided by 3 leaves a remainder of 2, when divided by 5 leaves a remainder of 3, and when divided by 7 leaves a remainder of 2? The Chinese remainder theorem can therefore be seen as solving a linear system but in a modu- lar system. For many students and future engineers, this theorem will never be used in practice, but some will see it again it in the field of cryptography (in the context of decryption especially). There are several possible proofs as always but we opted for the one that, as always for us, seemed the most educational. Given M and n both prime integers between them. Then special case of a system of two congruences (see further below for an example of resolution of a system of three congruences): x = a mod (m) x = b mod (m) (4.107) has a unique solution. Proof 4.24.2. As m and n are assumed as prime between them, there exists then u and v two integers such as (application of the Bezout identity proved earlier above): um + vn = 1 (4.108) Therefore we have: aum + aim = a (4.109) That is to say: aim = a mod (m) (4.110) Then we have also by extension: bum + bvn = b (4.111) That is to say: bum = b mod (n) (4.112) 218/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic So to be clear, we have so far: urn + v n = 1 =>- avn = a bum = b mod (m) mod (n) We then have for recall: bum = b mod (n) =>• bum + bvn = b But we can also writhe with k € Z: Therefore: bum + (k + av ) n = b ' V ' =bv Then we also have: bum + avn = b mod (n) atm = a mod (m) am + avn = a But we can also writhe with fceZ: (K + bu) m + avn = a Therefore: bum + avn = a mod (m) So to be alway clear, we have so far: urn + vn = 1 = (m, n) avn = a mod (m) bum = b mod (n) bum + avn = a mod (m) bum + avn = b mod (n) So finally we get that: x = bum + avn (4.113) (4.114) (4.115) (4.116) (4.117) (4.118) (4.119) (4.120) (4.121) (4.122) (4.123) (4.124) is a particular solution of the system. But we also have \/i,j 6 Z by the definition of the congruence: bum + avn + im = a mod (m) bum + avn + jn = b mod (n) (4.125) info @ sciences. ch 219/5785 4. Arithmetic EAME v3. 5-2013 So that x is always solution of the system, we must have: i = n j = m and therefore: (4.126) bum + aim + nm = a mod (m) bum + avn + rrin = b mod (n) Theref ore a little bit more general solution is (4.127) x = bum + avn + nm But by extension, we have the general solution: (4.128) x = bum + avn + znm with z G Z. We then say sometimes that the solution is "x modulo nm" . (4.129) □ Q.E.D ^Examples: As an example, consider the problem of finding an integer x such that: x = 2 mod (3) x = 3 mod (4) x = 1 mod (3) (4.130) A brute-force approach converts these congruences into sets and writes the elements out to the product of 3 • 4 • 5 = 60 (the solutions modulo 60 for each congruence): x G (2, 5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35, 38, 41, 44, 47, 50, 53, 56, 59} x e (3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 43, 47, 51, 55, 59} x G {1,6,11,16,21,26,31,36,41,46,51,56} To find an x that satisfies all three congruences, intersect the three sets to get: (4.131) x = 11 This solution is modulo 60, hence all solutions are expressed as: (4.132) x = 11 mod (60) (4.133) Another way to find a solution is with basic algebra, modular arithmetic, and stepwise substitution. 220/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic We start by translating these congruences into equations for some t, s, and u: x = 2 + 3 1 x = 3 + 4s x = 1 + 5m Start by substituting the x from the first equation into the second congruence: (4.134) 2 + 3/ = 3 mod (4) That is to say: (4.135) 3/ = 1 mod (4) Hence: (4.136) t = 3 mod (4) (4.137) meaning that t = 3 + 4s for some integer s. Substitute now t into the first equation: x = 2 + 3/ = 2 + 3(3 + 4s) = 11 + 12s Substitute this x into the third congruence: (4.138) 11 + 12s = 1 mod (5) That is to say: (4.139) 1 + Is = 1 mod (5) Hence: (4.140) 2s = 0 mod (5) meaning that s = 0 + 5m for some integer u. Finally: (4.141) x = 11 + 12s = 11 + 12 (5 m) = 11 + 60m So, we have solution {11, 71, 131, 191, . . .}. (4.142) info @ sciences. ch 221/5785 4. Arithmetic EAME v3. 5-2013 4.3.6 Continued fraction A continued fraction is an expression obtained through an iterative process of representing a number as the sum of its integer part and the reciprocal of another number, then writing this other number as the sum of its integer part and another reciprocal, and so on. In a finite continued fraction (or terminated continued fraction), the iteration/recursion is terminated after finitely many steps by using an integer in lieu of another continued fraction. In contrast, an infinite continued fraction is an infinite expression. In either case, all integers in the sequence, other than the first, must be positive. The integers a* are named the "coefficients" or "terms" of the continued fraction. The notion of continued fraction come back from the time of Fermat and culminated with the work of Lagrange and Legendre in the late 18th century. These fractions are important in physics because we find them back in our study of acoustic and also in the thought process that led Galois to create his group theory and also in the studies gear ratios of (for watch complica- tions as discussed in the section of Mechanical Engineering). To understand the motiviation of continued fraction let us introduce a basic example. Consider a typical rational number: 415 93 which is around 4.4624. As first approximation, stat with 4, which is the integer part: 415 , 43 — =4T — (4.143) (4.144) Note that the fractional part is the reciprocal of 93/43 which is about 2.1628. Use the integer part, 2, as an approximation for the reciprocal, to get a second approximation of: So we have so fare: 93 7 — = 2 + — 43 43 415 ’93’ 4 + (4.145) (4.146) Note that the fractional part is the reciprocal of 43/7 which is about 6.1429. Use the integer part, 6, as an approximation for the reciprocal, to get a second approximation of: Therefore: 43 y 415 ~93~ 4 + 2 + 1 6+ 7 (4.147) (4.148) 222/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Note that the fractional part 1/7 is the reciprocal of 7 which is about... 7 Use the integer part, 7, as an approximation for the reciprocal, to get a second approximation of: 7 T (4.149) Therefore we get: 415 ~ 93 ~ 4 + 2 + 6 + 7 + 4 + 2 + 6+ 7 (4.150) This expression is named as we know the "continued fraction representation of the number". Dropping some of the less essential parts of the expression: 415 ~ 93 ~ 4 + 2 + 1 6+ 7 (4.151) gives the abbreviated notation: f H4;2. 6 .7] (4.152) Note that it is customary to replace only the first comma by a semicolon. As generalization of the previous example let us consider in a first time the rational number a/b with (a, b) = 1 with b > 0 and a > b. We know that all the quotients q, and the remaining r, are within the scope of the Euclidean division positive integers. Let us recall that the Euclidean algorithm already seen earlier (but written in a slightly different way): a b b n T\ T 2 <?2 + <?3 + r J2 n ^3 r 2 r n - 2 r n - 1 q n (4.153) info @ sciences. ch 223/5785 4. Arithmetic EAME v3. 5-2013 By successive substitutions, we get: 7 — 9i + 77 — — 9i 6 b/ri 92 + n/r 2 = 9i 92 (4.154) 93 9m H Qn What is also sometimes written: a 1 7 = 9i + 77 = 9i b 1 /ei 92 H = ^2 (4.155) So any positive rational number can be expressed as a finite continued fraction where q n G N. Taking our introducing example: 415 "93 = 4 + = 4 + (4.156) 2 + 2 + 1 6+ 7 we notice indeed that g„eN and that we have by construction: 1 Qn = &n— 1 (4.157) where the brackets represent the integer part and that we also have: 7 — — Qn + £ £ni (4.158) The development of the number a/6 is named the "development of the number a/6 in finite continued fraction" and is condensed in the following notation: [9i;92,93,---,9« (4.159) Let us see now another example: 224/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic ^Example: Let us see how the extract the square root of a number A (for example A = 2 such that we want to extract y/2) by the continued fraction method. Given a the largest integer whose square a 2 is smaller than A. We subtract it to A. So there is a remaining of (for A = 2, we have a = 1): r = A — a 2 = (VA — a)(VA + a) (4.160) where we have used a remarkable identities that we will prove in the section Calculus later. Hence dividing both members by the second parenthesis, we have: VA — a = V A + a Therefore: VA — a - 1 — -= \J A CL In the denominator, we replace a /A by: r a H — -j= VA + a That gives: VA — a T 2a + VA T a (4.161) (4.162) (4.163) (4.164) etc .... we thus see that the system is simple to determine the expression of a root square in terms of continued fraction. We consider now as intuitive that every rational number can be expressed as finite continued fraction and conversely that any finite continued fraction represents a rational number. By extension, an irrational number is represented by an infinite continued fraction! Now consider [<?i; (?2, <Z3, • • • , q n \ a finite continued fraction. The continued fraction: C k = [qi 1 q 2 ,...,q n } (4.165) where k = 1, 2, . . . , n is named the "fc-th reduced" or "k - th convergent" or the "&;-th partial quotient". info @ sciences. ch 225/5785 4. Arithmetic EAME v3. 5-2013 With this notation, we have: C 1 = c 2 = ^3 = [Qi] = 9i [Qi\Q2] = 9i + — 92 [?i; 92,93] = 9i + _ q x q 2 + 1 92 1 _ (9i 92 + 1) ~ 93 + 9i 1 9293 + 1 92 H 93 (4.166) ((9i 92 + 1) • 93 + 9i) ' 94 + (9192 + 1) (9293 + 1) • 94 + 92 To simplify the expressions above, we introduce the the sequences {n;}, {d,} (n is for numerator and d for denominator) defined by: C 4 = [9i;92,93,94] = n 0 = 1, 7ii = 91 , . . . , rii = qiTii-i + do = 0, di = 1, . . . , di = qdi-i + d t - 2 (4.167) thanks to this construction, we have an interesting immediate little inequality that will be useful to us further below: 0 — do < d\ < d 2 < do <■■ ■ With the above definition, we find that: n _ n i n - 122 n - 123 r - Ui L 1 — ~Ti *-'2 - y, °3 Ci\ &2 CI 3 di Either by generalizing: Ck = [91, 92, 93, - - - , 9fc] = Uk db (4.168) (4.169) (4.170) Now let us show for later use that for i > 1, we have: nidi - 1 - diUi - 1 = (-1)* (4.171) The result is immediate for i = 1. Assuming that the result is true for i let us show that it is true for i + 1. Since: n i+1 di - di +1 n t = (g i+ i7ij + rij_i)dj - (q i+1 di + d^rii = -{d^Ui - n^di) = (-1)* +1 (4.172) then using the induction hypothesis, we get the result! We can now establish a vital relation for what will follow. Theorem 4.25. Let us prove that if Ck is the k - th reduced to of the simple finite continued fraction [qi,q 2 ,..., q n \ then: C 1 <C 3 <C 5 < . . . < Co < C 4 < C 2 (4.173) 226/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Proof 4.25.1. c k+2 -C. k = (C k+2 ~c k+1 ) + (c k+1 -c k ) = - r ^) + ( r ^r 1 -- r \ «fc + 2 Ctfe +1 / \ “fc+l Uk d k , k-\-2 ( ( t^fc+24+i 'H’ k +id k -i r2 \ ^ / n k -\.\d k n k d k s r \ \ ( 1 ) V d k+2 d k+ i ) V d k+ id k ) d k+2 d k+ i 4+i4 (-i) fc+2 4 + (-i) fc+1 4 +2 = (-i) fc+1 (-i)4 + (-i) fc+1 4 +2 4 +2 4+i4 4 +2 4+i4 -(-i) fe+1 4 + (-i) fc+1 4+ 2 (-i) fe+1 (4+ 2 - 4) 4 +2 4+i4 4+2 4+i 4 as: therefore: o — 4 ^ c?i <c 4 4 • 4+2 — 4 > o indicating to us that the sign of ( 4 +i — <4 is the same as (— l) fc+1 . It follows that C k+ 2 > <4 for an odd k, and C k+2 < C k for k even. Then: Ci < C 3 < C 5 < . . . and C 2 > (4 > Cq > . . . and after as: /~t k—i ki k d k —\ n k —\d k t - k C k — 1 rr j 7 ' ; . «fc-l UfcUfc - 1 So for /c even, we have C k > C k - 1 , we therefore deduce that: Ci < <4 < <4 < . . . < C 6 < <4 < C 2 44-i (4.174) (4.175) (4.176) (4.177) (4.178) (4.179) □ Q.E.D. Let us show now that every infinite continued fraction can be any irrational number. In formal terms, if {q n } is a sequence of positive integers and that we consider C n = [qi, q 2 , . . . , q n ] then it necessarily converges to a real number if n — )■ +00. Actually it is not difficult to observe (it’s quite intuitive) with a practical example that we have: C k -C k _i^ 0 (4.180) when k — >■ +00. Now, let us denote by x any real number and <i\ = M the integer part of this real number. Then we saw at the beginning of our study of continued fractions that: x = qi + Ei (4.181) info @ sciences. ch 227/5785 4. Arithmetic EAME v3. 5-2013 Therefore it comes: £i = x-qi (4.182) Let’s look for the needs of the section on Acoustics on the calculation of a continued fraction of a logarithm using the previous relation! First let us recall that: t ~ Qi + b l/£i = qi q2 + I/E2 That is (relation proved in the section of Functional Analysis): In (u) x = log a («) = ln(a) with 1 < a < u and (a, u) = 1. Given y n defined by: Therefore let us prove that: ]Jn—\ y-i = u y 0 = a y n+ i = — yn MVn-2) &r). / \ Qn ln( 2 / n *i) Indeed for n = 1 we have: . _ T n h ln M „ _ ln (y-i) £i — x — Qi — , 7 x — qi — -j— 7 — V- ln(a) m {y 0 ) Qi (4.183) (4.184) (4.185) (4.186) (4.187) for n = 2 we use first the fact that: ln(ri) ln(u) — q 1 ln(a) ln(rt) — ln(a l?1 ) £i = x — qi = t r — qi = j— = T~r \ m(a) fn(a) fn(a) 111 (jM (yf) ■“(»!) ln(a) ln(a) ln(a) Therefore: 1 ln(a) £i ln(yi) and as we had proved that: 1 1 ln(a) In (j/o) =q n + E n =>£ 2 = q 2 = — r - q 2 = r7 — \ _ 72 £ n - 1 £1 m(?/i) ln^G etc... by induction demonstrating our right to use this notation changes. (4.188) (4.189) (4.190) 228/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic ^Example: Let us look for the expression of the continuous fraction of: x = log a (u) = log 2 (3) (4.191) We know by playing with the definition of the logarithm that: 2 1 < 3 < 2 2 (4.192) therefore: loga^ 1 ) < log 2 (3) < log 2 (2 2 ) =>• log 2 (2) < log 2 (3) < 21og 2 (2) => 1 < !° g2 ^ < 2 => 1 < log 2 (3) < 2 log 2 ( 2 ) therefore qi — 1. Then we have: _ ln(t/i_ 2 ) _ ln(j/_i) oi Mvh) qi ln (//o) qi and as: V-i = u = 3 y 0 = a = 2 it comes: £i In(3) In(2) So we have the first partial quotient: a b log 2 (3) Qi + 1 1 / E\ Mg) _ i M2) Verbatim we have already: _ y-i _ 3 t/o+i - y i - - 2l 3 2 Let us simplify: ln(3) y 1 ln(2) ln(2) ln(2) ) 111 (!) Mj/i) So the first partial quotient can be written: (4.193) (4.194) (4.195) (4.196) (4.197) (4.198) (4.199) info @ sciences. ch 229/5785 4. Arithmetic EAME v3. 5-2013 230/5785 info @ sciences. ch Set Theory D uring our study of numbers, operators, and number theory (in the chapters of the same name), we often used the terms "groups", "rings", "body", "homomorphism", etc. and thereafter we will continue to do it again many times. Besides the fact that these concepts are of utmost importance, to give demonstrations or build mathemat- ical concepts essential to the study of contemporary theoretical physics (quantum field physics, string theory, standard model, ... ), they allow us to understand the components and the basic properties of mathematics and its operators by storing them in separate categories. So, choose to present the Set Theory as the 5th chapter of this book is a very questionable choice as rigor- ously that it is where almost everything begins... However, we still needed to expose the Proof Theory for the notations and methods that will be used here. Moreover, when teaching modern mathematics in the secondary or primary (in the years 1970), the language of sets and the preliminary study of binary relations to a more rigorous approach to the notion of functions and applications of mathematics in general was introduced (see defi- nition below). Definition (#62): We talk about "arrow diagram" ( or "sagittal diagram" from latin "sagitta" = arrow) to all diagram showing a correspondence between the two sets of components connected wholly or partially by a set of arrows. For example, the graphical representation of a defined function of the set E = {—3, —2, —1, 0, 1, 2, 3} to the set F = {0, 1, 2, 4, 9} lead to the sagittal diagram below: A relation from E to E provide an arrow diagram of the type: 4. Arithmetic EAME v3. 5-2013 Figure 4.33 - Function returning in its own set of definitions The closure of each element showing a "reflexive relation" and the systematic presence of a back arrow indicating a "symmetrical relation". Definition (#63): If the target set is identical to the original set, we say that we have a "binary relation". However choosing to introduce the Set Theory in school classrooms has also some other reason. In fact, for the sake of internal rigor (i.e. not related to reality), a very large part of mathematics was rebuilt within a single axiomatic framework, so called "Set Theory", in the sense that each mathematical concept (previously independent of the other) is returned to a definition where all the logical components come from this same framework: it is regarded as fundamental! Thus, the rigor of reasoning carried out within Set Theory is guaranteed by the fact that the frame is "non-contradictory" or "consistent". Let us see now the definitions that build this framework. Definitions (#64): Dl. We name "set" any list, collection or gathering of well-defined objects, explicitly or im- plicitly. D2. A "Universe" U is an object whose constituents are sets . Note that what mathematicians name "Universe" is not a set! In fact it is a model that satisfies to the axioms of sets. Indeed, we will see that we can not talk about the set of all sets (because this is not a set) to designate the object that consists of all the sets and that’s why we talk about "Universe". D3. We name "elements" or "members of the set" objects belonging to the set and we write: peA (5.1) if p is an element of the set A and in the contrary case: P&A (5.2) If B is a "part" of A, or "subset" of A, we write this: B c A or Ad B (5.3) 232/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Thus: Vx, x e B x e A (5.4) ^Examples: El. A = {1,2,3} E2. X = {X | a; is a positive integer} D4. We can provide sets with a number of relations that compare (useful sometimes...) their elements or to compare some of their properties. These relations are called "comparison relations" or "order relations" (see section Operators). Remarks Rl. The structure of ordered set has original been set up in the framework of the Numbers Theory by Cantor and Dedekind . R2. As we have proved in the chapter on Operators, N, Z, Q, M are totally ordered by the usual relations <,>• The relation <, often called "strict order" is not an order relation because not reflexive and not antisymmetric (see section Operators). For example, in N the relation "a divides b " , often denoted by the symbol "I" is a partial order. R3. If R is an ordering on E and F is a subset of E, the restriction to F of the relation R is an order on F, called "order induced by R in F" . R4. If R is an order on E, the relation R' defined by: xR'y -v^ yRx is an order on E, called "reciprocal order" of R. The reciprocal order < of the usual order is the order noted > and reciprocal order to the order "a divides b" in N is the order ”b is a multiple of a". V / The set is the basic mathematical entity whose existence is defined: it is not defined as itself but by its properties, given by the axioms. It uses a human process: a ki nd of categorization feature, which allows thought to distinguish several independent qualified elements. Theorem 4.26. We can demonstrate from these concepts, that the number of subsets of a set of cardinal n is 2 n . Proof 4.26.1. First there is the empty set 0, that is 0 items Chosen from n, i.e. 6',} (notation of binomial coefficient non-conform with ISO 31-11!) as we have seen in chapter Probabilities: c: = v = A k n k\ k\ k\ (n - k ) (5.5) info @ sciences. ch 233/5785 4. Arithmetic EAME v3. 5-2013 and so on... The number of subsets (cardinal) of E corresponds to the summation of all binomial coeffi- cients: Cart(P(£)) = 5;C* k=0 But, we have (see section Algebraic Calculation): n {x + yT = Y J C k n x k y n - k (5.7) k = 0 (5.6) therefore: C \ n Card (P (E)) = C k l k l n ~ k = (1 + l) n = 2 n k = 0 (5.8) □ Q.E.D. ^Example: Consider the set S = {xi,x 2 , x 3 }, we have the set of all parts of P(S) consisting of: — The empty set: {} = 0 — The singletons: xi,x 2 ,x 3 — The duets: xi,x 2 ,xi,x 3 ,x 2 ,x 3 — Itself: {x 1 ,x 2 ,x 3 } Such that: P(S) = {0, X U X 2 , X 3 , X U X 2 , X!, x 3 ,x 2 , x 3 , x u x 2 , x 3 } What makes effectively 8 elements! (5.9) Remark The order in which the elements are differentiated does not come into account when counting parts of the original set. In Applied Mathematics, we work almost exclusively with sets of numbers. Therefore, we will limit our study of definitions and properties of these. 234/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Now let us formali z e the basic concepts for working with the most common sets we encounter in the basic school curriculum. 5.1 Zermelo-Fraenkel Axiomatic The Zermelo-Fraenkel axiomatic, abbreviated sometimes "ZF-C axioms" shown below was formulated by Ernst Zermelo and Abraham Adolf Fraenkel specified by the early 20th century and completed by the axiom of choice (hence the capital C in ZF-C). It is considered as the most natural axiomatic structure in the context of set theory. Remark There are many other axiomatic structures, based on the more general concept of "class", as developed by von Neumann, Bernays and Godel (for the notations, see section Proof Theory). V / Strictly technically speaking..., the ZF axioms are statements of calculation for first order pred- icate (see section Proof Theory) egalitarian in a language with only one primitive symbol for membership (binary relation). The following should therefore only be seen as an attempt (...) to express in English the expected significance of these axioms. Al. Axiom of extensionality: Two sets are equal if and only if they have the same elements. This is what we note : A = B (Vx G A, x G B) A (Vx G B,x G A) (5.10) So A and B are equal if every element x of A is also in B and every element x of B also belongs to A. A2. Axiom of empty set: The empty set exists, we note it: 0 (5.11) and it has no element, its cardinality is therefore 0. In fact this axiom can be deduced from another axiom that we will see a little further but it is convenient to introduce it by convenience for teaching in high-school classes. A3. Axiom of pairing: If A and B are two sets, then, there exist a set C containing A and B alone and as components. This set C is then noted A, B. From the perspective of the sets considered elements that gives: VAVB3C : AeCABeC (5.12) This axiom also shows the existence of the "singleton" a set noted: {X} (5.13) info @ sciences. ch 235/5785 4. Arithmetic EAME v3. 5-2013 which is a set whose only element is X (and therefore with unitary cardinal). We simply need to apply the axiom asking equality between A and B. A4. Axiom of the sum (also called "axiom of union"): This axiom allows us to build the union (merge) of sets. Said in a most common way: the union (merge) of any family of a set, is... a set. The union of any family of sets is often noted: U Ai (5.14) i or if we take some of its elements: U Zi (5.15) xeA A5. Axiom of subsets: He expressed that for any set A, the set of all its parts P(A) exists (do not confuse with the "P" of probability!). So for any set A, we can associate a set B which contains exactly the parts C (verbatim the subsets) of the first: VA3BVC : (C EB^C cA) (5.16) A6. Axiom of infinity: This axiom express the fact that there exist an infinite set. To formalize it, we say that there exist a set, called "autosuccessor set" A containing 0 (the empty set) such that if x belongs to A, then also x U {x} belongs to A: A is autosuccessor (0 e A) A (x E A =>• (x U {x}) E A ) (5.17) This axiom expresses for example that the set of integers exists. Indeed, N is so the small- est autosuccessor set in the sense of inclusion N = {0,{0,{0, ...}}} and by convention we note (where we build the Natural Set): 0 = 0 1 = { 0 } 2 = { 0 ,{ 0 }} (5.18) A7. Axiom of regularity (also called "foundation axiom"): The main purpose of this axiom is just to eliminate the possibility of having A as part of itself. Thus, for any non-empty set A, there exists a set B which is an element of A such that no element of A is an element of B (you must distinguish the level of the language used, a set and its elements have not the same status!) that we note: WA t^0: 3BeA,ADB = 0 (5.19) and thus result we expected to have: VA,AgA (5.20) 236/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Proof 4.26.2. Indeed, let A be a set such that A G A. Consider the singleton {A}, set whose only element is A. According to the axiom of foundation, we must have an element of this singleton that has no element in common with him. But the only possible element is A itself , that is to say that we must have: An{A} = 0 (5.21) But by hypothesis A G A and by construction A G {A}. So: A G (A fl {A}) (5.22) which contradicts the previous assertion. Therefore: A £ A (5.23) □ Q.E.D. A8. Axiom of replacement (also called "Axiom schema of replacement"): This axiom expresses the fact that if a formula / is a functional then for any set A, there is a set B consisting precisely of the images of A by this function. So, in a little more formally way, the set A of elements a and a binary relation / (which is quite generally a functional), there exist a set B consisting of elements b such that /(a, b) is true. If / is a function where b is not free then it means that: b = /(A) and B = /(A) (5.24) In a technical way we write this axiom as following: VA 3a G A 3!/ : /(a, b) => 3£>Va G A3b G Bf(a , b) (5.25) So for every set A and any item it contains, there is one and only one b defined by the functional / such that there exists a set B for which any element a belonging to the set A there is a b belonging to set B defined by the functional /. Let’s see an example with the following binary predicate that for the value of any a from A determines the value of any b of B\ P(a , b) = (a = 1 A b = 2) V (a = 3 A b = 4) (5.26) Therefore from the knowledge that a is equal 1 we derive that b is equal 2 and similarly (i.e. by replacement) when a is equal 3, we derive that b is equal 4. We see well through this small example the strong relation that exists considerating the predicate P as a naive function! Moreover, as there an infinity of possible functions f, the replacement scheme is considered as an infinite number of axioms. A9. Axiom of selection (also called "Axiom comprehension schema"): This axiom simply expresses that for any set A and any property P expressible in the language of set theory, the set of all elements of A satisfying the property P exist. info @ sciences. ch 237/5785 4. Arithmetic EAME v3. 5-2013 So more formally, to any set A and any condition or proposition P(x), there is a set B whose elements are exactly the elements x of A for which P(x) is true. This is what we write: B = {x e A : P(x)} (5.27) In a more comprehensive and rigorous way we have in fat for any functional / that does not include a as free variable: \/A3B\/a : aeBoae AA/ (5.28) It is typically the axiom that we use to construct the set of even numbers: {a E N | 3b e N,a = 2b} (5.29) or to prove the existence of the empty set (which invalidates the axiom of the empty set) because you just have to ask that there exist a set that satisfies the property: A ± A (5.30) and regardless of the set A. And only the empty set satisfies this property by the selection axiom. The compliance with the strict conditions of this axiom eliminates the paradoxes of the "naive set theory", as Russell’s paradox or Cantor’s paradox who invalidated the naive set theory. For example, consider the Russell set R of all sets that do not contain themselves (note that we give a property of R without specifying what is this set): R={E:EgE} (5.31) The problem is to know whether or not R contains itself or not. If R. £ R., then, R is self- contained, and, by definition R ^ R, and vice versa. Each possibility is contradictory. If we now denote by C the set of all sets (Cantor Universe) we have in particular: P(C) E C (5.32) which is impossible (i.e. with the power of the continuum of real numbers), according to Cantor’s theorem (see section Numbers). These "paradoxes" (or "syntactic antinomies") come from a non-compliance with the conditions of application of the selection axiom: to define E (in the example of Russell), there must be a proposition P which bears on the set R, which should be explicated. The proposal defining the set of Russell or that of Cantor does not indicate what is the set E. It is therefore invalid! A very nice and well known example (this is why we present it) helps to better understand (this is the "Russel paradox" which we have already spoken about int length in the section on Proof Theory): A young student went one day to his barber. He entered into conversation and asked him if he had many competitors in his pretty city. Seemingly innocent way, the barber replied, «I have no concurrence. Because of all the men of the city, I obviously do not shave 238/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic those who shave themselves, but I am fortunate to shave, all those who do not shave themselves». What then in such a so simple statement could take to the fault the logic of our young smart student? The answer is in fact innocent, until we decide to apply to the case of the barber: Does he shaves himself, Yes or No? Suppose he shaves himself: he then belongs to the category of those who shave them- selves, those who the barber said he did of course not shave.... So he does not shave himself Finally, this unfortunate barber is in a strange position: if he shaves himself, he does not shave himself, and if he does not shave himself, he shaves himself. This logic is self-destructive, contradictory stupidly, rationally irrational. Then comes the selection axiom: We exclude the barber of all persons to which the declaration applies. Because in reality, the problem is that the barber is a member of the set of all the men of the city. So what applies to all men does not apply to the individual case of the barber. A 10. Axiom of choice: Given a set A of non-empty mutually disjoint sets, there exist a set B (the set of choices for A) containing exactly one element for each member A. However let us indicate that the issue of the axiomatization and therefore of the founda- tions found himself still shaken by two questions at the time of their construction: what valid axioms must be chosen and in a system of axioms are the mathematics coherent (do we not have a risk of seeing a contradiction)? The first issue was first raised by the continuum hypothesis: if we can put two sets of numbers in correspondence term to term, they have the same number of elements (car- dinal). We can thus map all integer numbers with rational numbers as we have shown in the section on Numbers, so they have the same cardinality, be we can not map integer numbers with all the real numbers. The question then is whether there is a set whose number of elements would be located between the two or not? This question is important to build the classical theory of analysis and mathematicians usually choose to say there is none, but we can also say the opposite. In fact the continuum hypothesis is linked in a more profound way we could thing to the axiom of choice which can also be formulated as follows: if C is a collection of non- empty sets then we can select any element of each set of the collection. If C has a finite number of elements or a countable number of elements, the axiom seems pretty trivial: we can sort and number the sets of C and the selection of an element in each set is simple. Where it begin to get complicated is when the set C has the power of the continuum: how to choose the elements if it is not possible to number them? Finally in 1938 Kurt Godel shows that set theory is consistent without the axiom of choice and without the continuum hypothesis as well as with! And to end it all Paul Cohen in 1963 shows that the axiom of choice and the continuum hypothesis are not related. Ok to make a pedagogical summary of all this stuff consider the following figure (excluding the axiom of choice): info @ sciences. ch 239/5785 4. Arithmetic EAME v3. 5-2013 V V AXIOM OF EXTENSION If two sets have the same elements, then they are equal. ■A* : •/ AXIOM OF SEPERATION We can form a subset of a set, which consists of some elements. EMPTY SET AXIOM There is a set with no members, written as { } or 0. # * * ■ PAIR-SET AXIOM Given two objects xand y we can form a set {x, y). # m # UNION AXIOM We can form the union of two or more sets. w 4 POWER SET AXIOM Given any set, we can form the set of all subsets (the power set). • ... AXIOM OF INFINITY There is a set with infinitely many elements. AXIOM OF FOUNDATION Sets are built up from simpler sets, meaning that every (non- empty) set has a minimal member. W AXIOM OF REPLACEMENT If we apply a function to every element in a set, the answer is still a set. Figure 4.34 - Zermelo-Frankel axioms visual summary (source:?) 5.1.1 Cardinals Definition (#65): Sets are said to be "equipotent" if there exists a bijection (one-one corre- spondence) between these sets. We thus say they have same "cardinal" that the norm ISO 3111 advocated to write card(S) but in this book we will also use the notation Card(S') (many U.S. books use non-official notation that looks exactly like the absolute value S' | or #S). Thus, more rigorously, a cardinal (which quantifies the number of items in the set) is an equiv- alence class (see section Operators) for the relation of equipotence. Remark Cantor is the main creator of set theory, in a form that we name today "naive set theory". But, apart to elementary considerations, his theory was also consisting of higher abstrac- tion levels. The real novelty of the Cantor theory is that it lets talk about infinity. For example, an important idea Cantor was precisely to define the "equipotence". V / If we write ci = c 2 as equality of cardinals, we mean by that there are two equipotent sets A 240/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic and B such that: ci = Card(A) and c 2 = Card(-B) (5.33) Cardinals can be compared. The order thus defined is a total ordering (see section Operators) between the Cardinals (the proof that the order relation is complete uses the axiom of choice and the proof that it is antisymmetric is known under the name of Cantor-Bernstein’s theorem that we will demonstrate later below). Say that c\ < c 2 means in simple language that A is equipotent to a proper part of B, but B is not equipotent to any own part of A. Mathematicians would say that Card(A) is smaller or equal to the Card(E>) if there is an injection of A into B. We saw during our study of numbers (see section Numbers), especially of transfinite numbers, that an equipotent set (or bijection) to N was told to "countable set". Let us now see this notion a little more in detail: Let A be a set, if there is an integer n such that there is at least for each element of A a corresponding item int the set {1, 2, ..., n} (in fact this is rigorously a bijection... concept that we will define later) then we say that the cardinal of A, denoted Card(A) or Card(A) is a "finite cardinal" and its value is n. Otherwise, we say that the set A has an "infinite cardinal" and we write: Card(A) = +oo (5.34) A set A is "countable" if there is a bijection between A and (. N ). A set of numbers A is "countable" if there is a bijection between A and part of (. N ). A set at maximum countable is thus of finite cardinal, or countable. We can therefore check the following proposals: PI . A part of a countable set is at most countable. P2. A set containing a non-countable set is also not countable. P3. The product of two countable sets is countable. info @ sciences. ch 241/5785 4. Arithmetic EAME v3. 5-2013 So any infinite subset of N is equipotent to N itself, what may seem counter-intuitive at first...! In particular, there are as many even integers as any natural integer numbers (use the bijection f(n) = 2 n) from N to P, where P is the set of even natural numbers. As many relative numbers as integers, as many integers as rational numbers (see the section on Numbers for the proofs). Thus we can write: Card(N) = Card(Z) = Card(Q) = K 0 (5-36) and more generally, any infinite part of Q is countable. Thus we have an important result: any infinite set therefore has an infinite countable part. Since we have proved in the section on Numbers that the set of real numbers has the "power of the continuum" and that the set of natural numbers has transfinite cardinal K 0 , Cantor raised the question whether there was a cardinal between the transfinite cardinal K 0 and the cardinal of M? In other words, we have an infinite amount of integers, and an even greater amount of real numbers. So does it exist an infinite greater than the infinite of integers and smaller than that of the real numbers? The problem arose by writing K 0 the cardinal of N and K , (new) the cardinal of M and offering to demonstrate or contradict that: K, = 2*° (5.37) according to the combinatorial law that gives the number of elements that we can get from from all subsets of a set (as we have proved it before). The rest of his life, Cantor tried, in vain, to prove this result that we name the "continuum hy- pothesis". He did not succeed and descended into madness. In 1900, the International Congress of Mathematicians, Hilbert considered that this was one of the 23 major issues that should be resolved in the 20th century. This problem is solved in a rather surprising way. First, in 1938, one of the greatest logicians of the 20th century, Kurt Godel showed that the hypothesis of Cantor was not rebuttable, that is to 242/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic say, we could never prove that it was false. Then in 1963, the mathematician Paul Cohen closed the debate. He demonstrated that we could never prove that it was true!!! We can conclude rightly that Cantor had become mad to try to demonstrate a problem that could not be proved. 5.1.2 Cartesian Product If E and F are two sets, we name "Cartesian product of E by F" the set noted E x F (not to be confused with the vector product notation) consisting of all possible pairs (e, /) where e is an element of E and / an element of F. More formally: ExF = {(e,f)\eeEAf eF}} (5.38) We note the Cartesian product of E by itself: E x E = E 2 (5.39) and then we say that E 2 is the "set of pairs of elements of E". We can perform the Cartesian product of a sequence E x x E- 2 x ... x E n of sets and get all n-tuples (ei, e 2 , ..., e n ) where e x G E x , e 2 G E 2 , ..., e n G E n . In the case where all sets E t are identical to E, the Cartesian product is obviously noted E" . We then say that e n is the "set of all n-tuples of elements of E" . If E and F are finite then the Cartesian product E x F is finished. Moreover: Card(£ x F) = Card(E) ■ Card(F) (5.40) From here we see that if the sets E x , E 2 , ..., E n are finished then the Cartesian product E\ x E -2 x ... x E n is finished and we have: n Card(£’i x E 2 X ... x E n ) = Card(^) (5.41) i = 1 In particular: Card (E n ) = [Card(£)] n (5.42) if E is a finite set. ^Examples: El . If M is the set of real numbers, then M 2 is the set of all couples of real numbers. In the plane reported to a referential, any point M has the coordinates that are an element of M 2 . E2. When we run two dice whose faces are numbered 1 through 6, each die can be symbolized by by the set E = {1,2, 3, 4, 5, 6}. The outcome of a roll of dies is then an element of E 2 = Ex E. The Cardinal of E x E is then 36. There is therefore 36 possible results when we launch two dices whose faces are numbered 1 to 6. info @ sciences. ch 243/5785 4. Arithmetic EAME v3. 5-2013 Remark Set theory and the concept of cardinal is the theoretical basis of relational database soft- wares. V / 5.1.3 Intervals Let M be a set of any numbers so that M cK (particular but frequent example). We have for definitions. Dl. x G R is called "upper bound" of the set M, if x > m for Vm G M. Conversely, we speak about "lower bound" (so do not confuse the concept of terminal with the concept of interval!). D2. Either McR,M/0. reRis called the "smallest upper bound" noted: x = sup M (5.43) of M if x is an upper bound of M and if for any upper bound y e Iwc have x < y. Conversely, we speak about the "smaller lower bound" that we note: x = inf M (5.44) The definitions are equivalent in the context of functional analysis (see section of the same name) as the functions are defined on sets. Indeed, let / be a function whose domain of definition / swept all M. We note that: f :E^R (5.45) and let x 0 G M. Definitions (#66): Dl. We say that / has a "global maximum" on x 0 if: VxeE: f(x) < f{x 0 ) (5.46) D2. We say that / has a "global minimum" on x 0 if: Vx E E : fix) > fix o) (5.47) In each of these cases, we say that / has an "global extremum" on (it is a concept that we often use in the sections of Analytical Mechanics and Numerical Methods!). 244/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Figure 4.35 - Global/Local Maximum and Minimum example (source: Wikipedia) D3. /is "upper bounded" if there is a real number M such as \/x G /, f{x ) < M. In this case, the function has an upper bound of / on its domain of definition / traditionally denoted: sup / (5.48) m D4. / is "lower bounded" if there is a real M such that \/x G I, fix) > M . In this case, the function has a lower bound of / on its domain of definition / traditionally denoted: inf / (5-49) D5. We say that / is "bounded" if it is both lower bounded and upper bounded (typically the case of trigonometric functions). 5.2 Set Operations We can build from at least three sets A, B,C all sets operations (which notations are due to Dedekind) existing in set theory (very useful in the study of probability and statistics). Remark Some of the notations below will be frequently use later in relatively complex theorems, so it is necessary to understand them deeply! V I / Thus, we can construct the following set operations: info @ sciences. ch 245/5785 4. Arithmetic EAME v3. 5-2013 5.2.1 Inclusion In the simplest case, we define the "inclusion" as: A c B Vx \x e A x eB] (5.50) In a non-specialized language here’s what to you have to read: A is "included" (is a "part", or is a "subset") in B then for all x belonging to each of these x also belongs to B : AczB Figure 4.36 - Visual example (Euler Diagram) of the inclusion where the U in the lower right comer of the figure represents the Cantor Universe. From this it follows the following properties: PI. If A G B and B e A then it implies A = B and vice versa. P2. If A e B and B e C then implies AeC, 5.2.2 Intersection In the simplest case, we define the "intersection" as: AnB = {x\xeAAxeB} (5.51) In a non-specialized language here’s what you have to read: the "intersection" of sets A and B consists of all the elements that are both in A and in B: 246/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Ar\B More generally, if (A*) is a family of sets indexed by i e /, the intersection of the (A*), i G / is denoted: H4; (5.52) ie/ This intersection is explicitly defined by: P| Ai = {# | V7 e /x e A*} (5.53) i£l That is to say the intersection of the family of indexed sets includes all x that are located in each set of all sets of the family. Given two sets A and B, we say they are "disjoint" if and only if: An 5 = 0 (5.54) Furthermore, if: A n B = 0 <=> Card(A U B) = Card(A) + Card(5) (5.55) Mathematicians note that: A U 5 (5.56) and name it "disjoint union". We sometimes joke that knowledge is built on the disjunction... (those who understand will appreciate...). Definition (#67): An collection S — S t of non-empty sets form a "partition" of a set A if the following properties hold: info @ sciences. ch 247/5785 4. Arithmetic EAME v3. 5-2013 PI. VS), Sj G S and i^j => Si H Sj — 0 P2. A = U Si Si€.S ^Example: The set of even numbers and the set of odd numbers are a partitions of Z. The intersection law is trivially a commutative law (see further below the definition of the concept of "law") as: AnB = BnA (5.57) 5.2.3 Union In the simplest case, we define the "union" (also sometimes named "merge") as: Au B = {x \ x e A\/ x e B} (5.58) In a non- specialized language here’s what you have to read: the "union" (or "merge") of the sets A and B is the set of elements that are in A plus those that are in B. A\jB More generally, if (Aj) is a family of sets indexed by % e I, the union of the i <E / is denoted: U Ai (5.59) iei 248/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic This union is explicitly defined by: |^J Ai = {x \3i e lx e Ai} (5.60) iei That is to say that the union of the family of indexed sets includes all x for which there is a set indexed by i such that x is included in on of the set We have the following distributive properties: U a) n b i£l ) (n - 4 <) u b The law of union U is a commutative law (se "law") as: U (A n B) (5.61) iei n (a u b ) i£l (5.62) further below the definition of the concept of AUB = BU A We also name "idempotences laws" the relations (note that for the general culture): AnA = A Au A = A and "absorptions laws" the relations: A n (A U B) = A A U (A fl B) = A The laws of intersection and union are associative, such that: An(BnC) = (AnB)nC A U (B U C) = {A U B) U C and distributive such that: A n (B u C) = (A n B) u (A n c) A u (B n C) = (A u B) n (A u C) (5.63) (5.64) (5.65) (5.66) (5.67) (5.68) (5.69) (5.70) (5.71) If we recall the concept of "cardinal" (see above) we have with the previously defined opera- tions, the following relation: Card(A U B) — Card(A) + Card(-B) — Card(A fl B) (5.72) 5.2.4 Difference In the simplest case, we define the "difference" as: A\B = {x \ x E A /\x B} (5.73) info @ sciences. ch 249/5785 4. Arithmetic EAME v3. 5-2013 In a non-specialized language here’s what you have to read: The "difference" of the sets A and B consists of all the elements found only in A (and thus excluding those of B ): A/B 5.2.5 Symmetric Difference Let U be a set. For any equation we define the "symmetric difference" A5B between A and B by: AAB = (A\B)U(B \ A) (5.74) In a non-specialized language here’s what you have to read: The "symmetric difference" of the sets A and B consists of all items that are only in A and those found only in B (we pass aside elements that are common): ALB 250/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic So as we can see we have: AAB = (AUB)\ (An B) Some trivial properties are given below: PI. Commutativity: A A B = B A A P2. Complementarity (see definition below): A c A B c = A A B 5.2.6 Product In the simplest case, we define the "set product" or "cartesian product" as: A x B = {(x, y) | x G A A y e B} (5.75) (5.76) In a non-specialized language here’s what to you have to read: "product" (not to be confused with the multiplication or cross product of vectors) of two sets A and B is the set of pairs such as each element of each set is combined with all elements of the other set. The product set of real numbers for example generates the plane where each element is defined by X and Y axis. We often find products sets in mathematics and physics when we work with functions. For example, a function of two real variables which gives real output will be written: f(x,y)^z (5.77) RxR — s-R or more simply: f( x i y) = z (5.78) 5.2.7 Complementarity In the simplest case, we define the "complementarity" as: VA c U A = {x | x e U, x £ A} (5.79) In a non-specialized language here’s what you have to read: The "complementary" is defined as taking a set U and a subset AofU then the complement of A in U is the set of elements that are in U but not in A: info @ sciences. ch 251/5785 4. Arithmetic EAME v3. 5-2013 A u Figure 4.41 - Visual example (Euler diagram) of the difference Other notations of complementarity that is sometimes found in the literature and the following book are (depending on the context to avoid confusion with other stuff): uA 0 or A c (5.80) or in the particular example above, we could also just write B \A. We have for properties for all A* included in any B\ IM<) =fK i£l / i&I rv<V=iM( iei / iei Here are some trivial properties regarding to complementarity: A = A (5.83) AnA = 0 (5.84) AuA — U (5.85) There are other very important relations that also applied to Boolean logic (see section Logic Systems). If we consider three sets A, B,C as shown below we have: A\(BnC) = (A\B)U(A\C) (5.86) A\B\C= {A\B)U(AnB) (5.87) (5.81) (5.82) and the famous "De Morgan’s laws" in set form (see section Logic Systems), which are given by the relations: An B = AU B Aub = AnB (5.88) 252/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic We would like indicate before moving on to another topic, that a significant number of adults in employment (mostly managers) having forgotten the previous defined concepts after leaving high school must study them again when they leam the SQL language (Structured Query Lan- guage) which is the most common worldwide language to query corporate databases servers in the 20th and 21st century. Most of them learn in training centers the following scheme to build queries with joins: SELECT <«rkci l<«> FROM TAMA IJ'.IT JOIN Table II II ON A.Ke> = B.Ko SQL JOINS SELECT <Kkft l»*t> FROM TaWcA A INNI R JOIN Table B B ON A. Key = B.Kcy SRLECT <tckct_lbO i IOM Ml V V RIGHT JOIN TablrB B ON A. Key = B.Kry SliIJiCT <*clect_lm> FROM TublrA A LEFT JOIN Table H II ON A. Key « B.Kcy WHERE B.Kcy IS NULL SELECT <nckct li*t* FROM T-blcA A PULLOt mjOlN MfeBB ON A.Kcv * B.Ko •Cl Mc4T«n. 30Dt SELECT <acIcci U*i> I ROM Mil \ \ RIGHT JOIN T«l>kll II ON A. Key - B.Kcy WHERE A.Kry IS NULL SIULCT <*ckci I»ai> FROM T-bkA A IT U. OUTER JOIN TabkB B ON A. Key = B.Kcy WHERE A. Key IS NULL OR BJCayUNUU ' Figure 4.42 - Common SQL query expressions with joins 5.3 Functions and Applications Definition (#68): In mathematics, an "application" (or "function") denoted typically / - in analysis - or A - in linear algebra - is the information of two sets, the departure set E and arrival set F (or "image of E"), and a relation associating each element x of the departure set one and only one element of the arrival set, which we call "image of x by /" in the analysis field we note that f{x) or f(E) to explicit the departure set. We name "images" the elements of f(E) and the elements of E are called the "antecedents". Then we say that / is an application from E to F denoted: / :E^F (5.89) (remember the first arrow/sagittal diagram presented at the beginning of this section), or we also say that this is an application of arguments in E and values in F. info @ sciences. ch 253/5785 4. Arithmetic EAME v3. 5-2013 Remark Note: The term "function" is often used for applications with scalar numeric values, real or complex, that is to say when the arrival set is M. or C. We speak then of "real function" or "complex function". In the case of vector we prefer to use the word "application" as we already mention it in the definition. V / Definitions (#69): Dl. The "graph" or "plot" (or also called "graphic" or "representative") of an application or function f : E F is the subset of the cartesian product E x F consisting of pairs for x varying in E. The data of the graph / determines its starting set (by projection on the first argument often denoted x) and image (projection on the second argument often denoted y). D2. If the triplet f(E, F, T) is a function where E and F are two sets and T C (E x F) is a graph, E and F are the source and purpose of / respectively. The "definition domain" or "departure set" of / is: Df = I = {x e E I 3y e F, (x, y) G T} (5.90) D3. Given three non empty sets E, F, G, any function of E x F to G is named a "composition law" of E x F with values in G. D4. An "internal composition law" (or simply "internal law") in E is a composition law of E x E with values in E (that is to say this is the case E = F = G). Remark The subtraction in N is not an internal composition law although it is part of the four basic high-school arithmetic operators. But the addition in N is such an internal law. V / D5. An "external composition law" (or simply "external law") in E is a composition law of F x E with values in E, where F is a separate set of E. In general, F is a set, called "scalar set". ^Example: In the case of a vector space (see definition much lower) the multiplication of a vector (whose components are based on a given set) by a real scalar is an example of external composition law. Remark An external composition law with values in E is also called "action of F on E". The set F is then the field operators. They also say that F operates on E (keep in mind the example of the vectors mentioned above). V W 254/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic D5. We name "image of /", and note Im(f), the subset defined by: Im (/) = f(E ) = {yeE\3xeE,y = f(x)} \ (5.91) Thus, "the image" of a function / : E H * F is the collection of f(x) for x browsing E. It is a subset of F. And we name "kerr of /", and we note ker(/), the very important subset in mathematics defined by: ker (/) = /({ 0}) = {x G E \ f(x) = 0} (5.92) According to the figure (you must deeply understand this concept because we will reuse the ker many times to prove theorems that have important practical applications later in various chapters): lm(/) image of / Figure 4.43 - ker concept of a function Remark Rl. ker(/) is derived from the German "Kern", simply meaning "kernel". R2. Normally the notations Im and ker are reserved for group homomorphisms, rings, fields and to linear applications between vector spaces and modules, etc. (see further below). We do not usually use them for any applications between any sets. But ... it does not really matter for the moment at this level of the book. V I W Applications and functions can have a phenomenal amount of properties. Below you can found some easy one that are part of the general knowledge of the physicist (for more information about what a function is, see the section on Functional Analysis). Let / be an application or function of a set E to a set F then we have the following properties: info @ sciences. ch 255/5785 4. Arithmetic EAME v3. 5-2013 PI. An application or function is said to be "surjective" if: Any element y of F is the image by / of at least (we emphasize on the "at least") an element of E. We thus say that it is a "surjection" from E to F. It follows from this definition, that an application or function / : E — > F is surjective if and only if F = Im /. In other words, we also write this definition as following: My e F, 3x G E : y = f(x) (5.93) E Figure 4.44 - Schematic representation of a surjective application or function P2. An application or function is said to be "injective" if: Any element y of F is the image by / of at most / (we emphasize the "at most") a single element of E. We thus say that / is an injection of E to F. It follows from this definition, that an application or function / : E — > F is injective if and only if the relations x\,x 2 G E and f(x\) = f(x 2 ) involve. In other words: an application or function for which two separate elements have distinct images is called injective. Or an application or function is injective at least if one of the following equivalent properties holds: P2.1 Mx, y G E 2 : f(x) = f(y) ^x = y P2.2 Vx, y : x^y^ f(x) ± f(y ) P2.3 My e F the equation in x,y — f(x) has at least one solution in E All this can be resumed by: 256/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic F Figure 4.45 - Schematic representation of an injective application or function P3. An application or function is said to be "bijective" or "total application/function" if: An application or function / from E to F is both injective and surjective. In this case, we have that for any element y of F, the equation y = f(x) admits in A a single (not "at least" or not "at most") pre-image x. What we also write: \/y G F, 3!x G E : y — f(x) (5.94) This is illustrated by: ^ E F Figure 4.46 - Schematic representation of a bijective application or function We are thus naturally led to define a new application from F to E, called "inverse func- tion" or "reciprocal function" of / and noted / ~~ 1 that to every element of F matches the unique pre-image element x of E (also called sometimes "solution") of the equation y = f(x). In other words: x = f~\y) (5.95) The existence of an inverse (reciprocal) function or application implies that the graph of a bijective function or application (in the set of real numbers...) and that of its inverse (reciprocal) are symmetric with respect to the right of equation y = x. Indeed, we notice that if y = f(x) is equivalent to x = / _1 (t/), then these equations imply that the point (x, y) is on the graph of / if and only if the point (y, x) is the graph of equation / -1 . info @ sciences. ch 257/5785 4. Arithmetic EAME v3. 5-2013 y = f(x) Figure 4.47 - Bijective function example As you can see for example in the figure below with the sinus function (see section Trigonometry): ^Example: Take the case of a holiday station where a group of tourists must be housed in a hotel. Each way to allocate these tourists in hotel rooms may be represented by an application of all tourists to all the rooms (to each tourist is assigned a room). • Tourists want the application to be injective, that is to say, each of them has a single room. This is only possible if the number of tourists does not exceed the number of rooms. • The hotel manager hopes that the application is surjective, that is to say, each room is occupied. This is only possible if there are at least as many tourists than rooms. • If it is possible to spread the tourists so that there is only one per room, and all the rooms are occupied: the application will then be both injective and surjective that is to say bijective. 258/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Remarks Rl. It comes from the definitions above that / is bijective in the set of real numbers if and only if any horizontal line intersects the graph of the function at a single point. This leads us to the second following remark: R2. An application that satisfies the test of the horizontal line is continuously increasing or decreasing at any point in its domain. \ I " / P4. An application or function is named "composite application" or "composite function" if: Let p be an application or function from E to F and 0 an application or function of F in G. The application or function that associates to each element x of the set E an element ip(p(x)) of G is named "composed application" of p and 0 and is denoted by: 0 o ip (5.96) where the symbol "o" is called "round" (do not confused with the scalar product we will see later in the section of Vectorial Calculus). Thus, the above relation is written "psi round phi" but has to be read "phi round psi" (...). So: (0 O p)(x) = 0(<Jfe)j (5.97) Let, moreover, x be an application (not a function!) of G in H. We check immediately that the composition operation is associative for applications (for more details see the section of Linear Algebra): yE(0°0 = (x°0)o (p (5.98) This allows us to omit parentheses and write more simply: XO'ijjotp (5.99) In the particular case where p would be an application or function from E to E, we note p k the composed application p o tp o ... o p (k times). What’s important in what we have seen until now in this section is that all defined properties listed above are applicable to Numbers’ Sets. Let us see a concrete and very powerful example: info @ sciences. ch 259/5785 4. Arithmetic EAME v3. 5-2013 5.3.1 Cantor-Bernstein Theorem Warning! This theorem, for which the result may seem trivial, is not necessarily easy to ap- proach (its mathematical formalism is not very aesthetic...)- We advise you to read the proof slowly and imagine the sagittal diagrams in your head during the development. Here is the hypothesis to prove: Theorem 4.27. Let X and Y be two sets. If there is an injection (remember the definition of an injective function or application above) from X to Y and another from Y to A", then both sets are in bijection (remember the definition of an bijective function or application above). It is therefore an antisymmetric relation. This is illustrated by: Figure 4.48 - Representation of a antisymmetric relation For the proof we need rigorously to demonstrate beforehand a lemma (intuitively obvious... but not formally) who’s statement is as follows: Lemma 4.27.1. Let X, Y, Z three sets such that X C Z C Y. If X and Y are in bijection through a function /, then X and Z are in bijection through a function g. An example of application of this lemma is the set of natural numbers and rational numbers which are in bijection (see the section of Number Theory for the proof). Therefore, all the rational numbers are in bijection with the set of natural numbers since N C Z C Q. Proof 4.27.1. First, formally, we create a function / from Y to X such that it is bijective: f :Y^X (5.100) 260/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic To continue we need a set A that will be defined by the union of the images of the functions of the functions / (of the kind /(/(/...)))) of the pre-images of the set Z (remember that Z C Y) which we exclude elements of X (that we will be noted for this proof: Z — A"). In other words (if the first form is not clear...) we define the set A as the union of images of (Z — X) by the applications / o / o ... o /. What we write: OO A={J f n (Z-X) (5.101) 71—1 Because / : Y — >• A" and that (Z — A") C Y we have by construction A C X and thus ((Z — A) U A) C Z. Note that we also have: OO / OO \ OO OO a = u n z - x ) =* ha) = / u n z - x ) = u / (. n z - x )) = u f n+ \ z - x ) 71=1 \ 71=1 / 71=1 71=1 (5.102) and by reindexing: OO f(A)= U f n (Z-X) (5.103) 71=2 We then have (make a pattern in your head of the arrow diagrams can help at that level of the proof...): /((Z — X) U A) = A (5.104) We can elegantly demonstrate this last relation: OO OO f((Z - X) U A) = f(Z - X) U f(A) = f(Z - X) U u nz - X) = IJ f n {z - X) 71=2 71=1 (5.105) Since Z can be partitioned (nothing stop us to do this!) in two disjoint subsets (Z — A") U A and (A" — A) and without forgetting that X C Z C Y and AC A, we set as definition the function g (we don’t give more information about it yet) such that: 5 : Z 4 A (5.106) and for every pre-image a of g of the partition ((Z — X) U A) C Z we have: Vae((Z-A)UA)^/(a) (5.107) This means that because ((Z— X)UA) C Z and Z C Y we can thus apply the bijective function / (remember that / : Y — > A") as equivalent of the function g to any element of ((Z — A") U A). We also have also for every pre-image a of g of the partition (A — A) (remember that AC A): Vae(X-A)h> 0 (5.108) The application g is then bijective because its restrictions to the ((Z — A") U A) and (A — A) are / and the identity which are bijective by definition. Finally there exists, by construction, a bijection between A and Z. info @ sciences. ch 261/5785 4. Arithmetic EAME v3. 5-2013 □ Q.E.D. Now that we have proved the Lemma let us recall the assumptions of the Cantor-Bemstein theorem using the result of the Lemma: Consider (p an injection from X to Y and -0 an injection for Y to X with A" C Y. We thus have: <p( X) C Y and ^{Y) C X (5.109) so (we recognize here the statement of the lemma): i/j((p(x)) c -0(y) c x (5.110) a o (6 * c) = (a o b) * (a o c) (a *b) o c = (a ob) * (b o c) (M, *) est un magma si = * est une operation * est une loi interne x * a = x * b a = b (M, *) est un monoide si * est associative 3 un element neutre n G M pour * a + b = c a b a-\-b £l + £l = £l i = 1 i = 1 i=l 262/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic (a b \ / a+b \ Xi + X 1 ) + c= X 1 ) + U=1 i= 1 ) \i=l ) = o = — c a + b = —c a' ■ a — n — 1 a' = - a ( G , *) est un groupe * est associative 3 6 Gun element neutre pour * VieG possede un symetrique pour * (p, q) eZ/q = o C P/q)/r = p/{q/r ) a b b a = ±1 a = ±b info @ sciences. ch 263/5785 4. Arithmetic EAME v3. 5-2013 (a + ib ) + (c + id) = a + ib + c + id = a + c + ib + id — (a -\- c) A i(b A c/) (a + ib) — (c + id) = a + ib — c — id = a — c + ib — id = (a — c) + i{b — d) (a + ib)(c + id) = ac + aid + ibc + ibid = ac + iad + ibc + i 2 bd = ac + iad + ibc + (— 1)M = (ac — bd) + i(ad + be) 1 _ (s - iy) x + iy (x + iy) ( x — iy) x — iy x 2 — (iy) 2 x — iy x 2 + y 2 x _ y x 2 + y 2 x 2 + y 2 (A, +, *) est un anneau si •<== (A, +) est un groupe abelien La loi * est associative < La loi * est distributive par rapport a la loi + (Z, +) (Q,+) (M,+) (C, +) 264/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic (Z, +, x) (Q, +, x) (M, +, x) (C, +, x) (67, +, *) est un corps si (67, +, *) est un anneau unitaire (67 — {0}, *) est un groupe (Z, +, x) (Q, +, x) (M, +, x) (C, +, x) (Q,+,x) (M, +, x) (C, +, x) i, +) groupe commutatif < L’ addition est une operation interne L’ addition est associative L’ addition est commutative II existe un element neutre pour 1’ addition : 0 Tout nombre reel possede un oppose La multiplication est une operation interne La multiplication est associative La multiplication est commutative II existe un element neutre pour T addition : 1 Tout nombre reel non nul possede un inverse La multiplication est distributive par rapport a T addition l, x) groupe commutatif < ! La relation < est reflexive La relation < est antisymetrique La relation < est transitive info @ sciences. ch 265/5785 4. Arithmetic EAME v3. 5-2013 (£,+,*) ( E , +) est un groupe abelien * est une loi externe definie par * : E x K — >■ E (x, a) — >■ a * x x = t 2 + 2t + 3 y = -t + 5 F ^ 0 v(x, y) g f 2 , x + y g f VA G F,VX G F, XX G F F ^ 0 gf 2 ,^ + ^gf VA G K , V af G F, Aaf G F (A, F, +, •, x) est une F-algebre =>■ (A, +, -)est un C-espace vectoriel (A, +, x)est un anneau unitaire VA G C, Va, b G A, (Aa) ■ b = a ■ (A 6) = A(a ■ 6) f{a*b) = f{a)o f(b) Va, 6 G A f{a*b) = f{a)o f(b) Wa,b e A /(1a) = Is /(1a) = 1b /(a + a') = /(a) + /(a 7 ) /(a • a') = /(a)/(a) 266/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic ker (/) = {0} f{x + y)= f(x)*f{y) Wx,yeA /( 1 a ) = 1 B f(x _1 ) = (/(+>) _1 Vx e A VxeA: 1 B = f(l A ) = /(x + x -1 ) = f(x) * /(x _1 ) Vx e A : 1 B = f(x) * f(x x ) Vx 1 A : fix) = f 1 '; = ( fix - 1 ))- 1 1 = /(l) = f(a-b) = /(a) ■ f(b ) fix + y) = f(x) + f(y)W(x, y) e A /(Ax) = A/(x) Vx G 4,VA 6 K f(a + a') = /(a) + /(a) = 0 + 0=0 f(ra ) = f(r)f(a) = f(r) -0 = 0 info @ sciences. ch 267/5785 4. Arithmetic EAME v3. 5-2013 {ax \x E A } a = qr + r' I = rZ m ■ a = d ■ n ■ a E dZ /(") = /( 1 + - + 1 ) "s n — 1a + ••• + 1a) S v ✓ n = n ■ 1 A f(n ) = n ■ 1 A (1) o) • (0, 1) = 0 ☆☆☆☆☆ 16 votes, 62.5% 268/5785 info @ sciences. ch *0 • Probabilities Robability is the measure of the likelihood that an event will occur and therefore the calculation of probabilities handles random phenomena (known more aesthetically as "stochastic processes" when their are time-dependent), that is to say, phenomena that do not always lead to the same outcome and that can be studied using numbers their implications and occurrences. However, even if these phenomena have variable outcomes, depending on chance, we observe a certain statistical regularity. Probability is quantified as a number between 0 and 1 (where 0 indicates impossibility and 1 indicates certainty). The higher the probability of an event, the more certain we are that the event will occur. The concepts related to probabilities have been given an axiomatic mathematical formaliza- tion in probability theory (see further below), which is used widely in such areas of study as mathematics, statistics, finance, gambling, science (in particular physics), artificial intelli- gence/machine learning, computer science, game theory, and philosophy to, for example, draw inferences about the expected frequency of events. Probability theory is also used to describe the underlying mechanics and regularities of complex systems. Definitions (#70): There are several ways to define a probability. Mainly we are talking about: Dl. "Experimental or inductive probability" which is the probability derived from the whole population. D2. "Theoretical or deductive probability" which is the known probability through the study of the underlying phenomenon without experimentation. It is therefore an "a priori" knowl- edge as opposed to the previous definition that was rather referring to a notion of "a posteriori" probability. As it is not always possible to determine a priori probabilities, we are often asked to perform experiments. We must then be able to pass from the first to the second solution. This passage is supposed to be possible in terms of limit (with a population sample whose size approaches the size of the whole population). The formal modeling of the probability calculus was invented by A.N: Kolmogorov in a book published in 1933. This model is based on the probability space (U, A, P) that we will define a little further and that we can relate to the theory of measurement (see section Measure Theory). However, the probabilities were studied in the scientific point of view by Fermat and Pascal in the mid 17th century. 4. Arithmetic EAME v3. 5-2013 Remark If you have a teacher or trainer who dare to teach statistics and probabilities with exam- ples based on gambling (cards, dice, match, toss, etc.) dispose it to whom it may concern because it would mean that he has no experience in the field and he will teach you any- thing and no matter how (examples could normally be based on industry, economy or R&D, in short: areas daily used in companies but especially not on gambling ...!). V, " / 6.1 Event Universe Definitions (#71): Dl. The "universe of events", or "universe of observables", U is the set of all possible out- comes (results), called "elementary events" that occur during a random determined test. The universe can be finite (countable) if the elementary events are finite or continuous (uncountable) if they are infinite. D2. Any "event" A is a set of elementary events and is part of the universe of possible U. It is possible that an event is composed of only a single elementary event. ^Example: Consider the universe of all possible blood groups, then the event A "the individ- ual is Rh positive" is represented by: A = {A+, -B+, AB-\~, 0+} C U while the event B "the individual is the universal donor" is represented by: B = { O -} C U thus being an elementary event. D3. Let U be a universe and A an event, we say that the event A "occurs" (or "is realized") if during the run of the trial the issue i (i E U) occurs and that i 6 A . Otherwise, we say that A "was not realised". D4. The empty subset 0 of U is called "impossible event". Indeed, if during a trial where the event i occurs, we always have i 6 0 and the event 0 then never occurred. If U is finite, or countably infinite, any subset of U is an event, that is no longer true if U is uncountable (we will see in the chapter Statistics why). 270/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic D5. The set U is also called "certain event". Indeed, if at the end of the trial the event i occurs, we have always (since U is the universe of events). The event U then always occurred. D6. Let A and B be two subsets of U. We know that the events A U B and /l fl B arc both subsets of U then, events that are respectively "joint events" and "disjoint events". If two events A and B are such that: A n B = 0 ( 6 . 1 ) the two events may not be feasible during the same trial, then we say that they are "mutually exclusive events". Otherwise, if: AnB ^ 0 (6.2) the two events may be feasible during the same trial (the possibility to see a black cat when we pass under a ladder, for example), we say conversely that they are "independent events". 6.2 Kolmogorov’s Axioms The probability of an event is somehow responding to the notion of frequency of a random phenomena, in other words, at each event we will attach a real number in the interval [0, 1], which measure the probability (chance) of realization. The properties of frequencies we can highlight during various trials allow us to determine the properties of probabilities. Let U be a universe. We say that we define a probability on the events of U if any event AofU we associate a number or measure P{A), called "a priori probability of event A" or "marginal probability of A". Al. Fon any event A: 1 > P{A) > 0 (6.3) Thus, the probability of any event is a real number between 0 and 1 inclusive (this is common human sense...). A2. The probability of the certain event or of the set (sum) of possible events is equal to 1: P(U) = 1 (6.4) A3. If A fl B = 0 two events are incompatible (disjoint), then: 'P(AUB) = P(A) + P(B) < > (6.5) info @ sciences. ch 271/5785 4. Arithmetic EAME v3. 5-2013 the probability of the merge ("or") of two mutually incompatibles events (or mutually exclusive) is therefore equal to the sum of their probabilities (law of addition). We then speak of "disjoint probability". We understand better that the third axiom requires A fl B = 0 otherwise the sum of all proba- bilities could be greater than 1 (imagine again the set diagram of the two events in your head!). ^Example: Consider that in a given area, over 50 years, the probability of a major earthquake is 5% and on the same period the probability a major flood is 10%. We would like to know what is the probability that a nuclear plant meets at most one of two events during the same period if they are incompatible.... We have then the total probability that is the sum of the two probabilities which is 15%... We will find an example of this kind of disjoint probability in the chapter of Industrial Engi- neering when studying F.M.E.A. (Failure Modes and Effects Analysis) for fault analysis systems with a complex structure. In other words in a more general form if (A) ieN is a sequence of pairwise disjoint events (A* and Aj can not occur at the same time though i ^ j) then: (6.6) We then speak of "a- additivity" because if we look more closely at the three axioms above the measure P forms a cr-algebra (see section Measure Theory). At the opposite, if the events are not incompatibles (they can overlap or in other words: they have a joint probability), we then have for probability that at most one of the two takes place: P(AuB) = P(A) + P(B)-P(AnB) (6.7) This means that the probability that at most one of the events A or B occurs is equal to the sum of the probabilities for the realization of A or B occurred, minus the probability that A and B occurred simultaneously (we will show later that this is simply equal to the probability that the two do not occur at the same time!). ^Example: Consider that in a given area, over 50 years, the probability of a major earthquake is 5% and on the same period the probability a major flood is 10%... We would like to know what is the probability that a nuclear plant meets at most one of two events during the same period if the are not incompatibles. We then calculate the probability that from the above equation that gives 14.5%... 272/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic And thus if they were incompatibles we would have then A n B = 0 we find again the disjoint probability: P(AUB) = P(A) + P(B) ( 6 . 8 ) An immediate consequence of the axioms (A2) and (A3) is the relations between the probability of an event A and its complement, noted A (or more rarely in accordance with the notation used in the chapter of Proof Theory the complementary may be noted ->A): P(A ) = 1 - P(A) (6.9) Let U be a universe with a finite number of n possible outcomes: U ■••An} (6.10) where the events: h = { a } ,h = {k},h = {* 3 } , -,In = {in} ( 6 . 11 ) are called "elementary events". When these events have the same probability, we say they are "equiprobables". In this case, it is very easy to calculate the probability. Indeed, these events being by definition incompatible with each other at this level of our discussion, we have under the third axiom (A3) of probabilities: P (h U I 2 U ... U A) = p(h) + P(I 2 ) + ... + P(I n ) (6.12) but since: P(J 1 UJ 2 U...U/ n ) = P{U) = 1 (6.13) and that the probability of the right hand are by hypothesis equiprobable, we have: P(h) = P(I 2 ) = ... = P(I n ) = - (6.14) n Definition (#72): If A and B are not mutually exclusive but independent, we know that by their compatibility A n B = 0, that (very important in statistics!): P(A n B) = P(A) ■ P(B) (6.15) the probability of the intersection ("and" operator) of two independent events is equal to the product of their probabilities (law of multiplication). We name it "joint probability" (this is the most common case). info @ sciences. ch 273/5785 4. Arithmetic EAME v3. 5-2013 ^Example: Consider that in a given area, over 50 years, the probability of a major earthquake is 5% and on the same period the probability a major flood is 10%. .. Assume that these two events are not mutually exclusive. In other words that they are compatible. We will be interested to their independence. Thus, we would like to know what is the probability that a nuclear power plants meets the two events at the same time, at any time, during this same period. We then calculate the probability from the above equation that gives 0.05%... Under a more general form, the events A 1; A 2 , A n are independent if the probability of the intersection is the product of the probabilities: ( n \ n nunw i= 1 ) i = 1 (6.16) Remark Be careful to not confuse "independent" and "incompatible" ! So far to summarize a bit we have: Type Expression 2 incompatibles events (disjoints) P(AUB) = P(A) + P(B) 2 not incompatibles events (joints) P(A U B) = P(A ) + P(B) - P(A n B) 2 not incompatibles but independents events P(A n B) = P(A) ■ P(B) J Table 4.14 - Classical cases of probabilities Thanks the above definition, we can show that the probability that either A or B is to take place (e.g. at least one of the two but not both at the same time), is simply equal to... the probability that the two do not does not occur at the same time: P(A U B) = P(A) + P(B) - P (An B )) = P(A) + P(B) - P(A)P(B) = 1 - P(A)P(B) (6.17) = 1-P(1 -A)P(1 -B) We can also use this definition to determine the probability that only one of two events occurs: P(A ®B) = P(A)P(B) + P(B)P(A) = P(A)( 1 - P(B)) + P(B)( 1 - P(A)) = P(A) + P(B) - 2 P(A)P(B) (6.18) = P(A) + P(B) - 2 P(A n B ) 274/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic ^Example: Consider that in a given area, over 50 years, the probability of a major earthquake is 5% and on the same period the probability a major flood is 10%.... We would like to know what is the probability that a nuclear power plant exactly meets one of the both events during the same period, assuming they can not occur at the same time. We then calculate the probability from the above equation and that gives 14%... There is a common and important area in the industry where the four following relations are frequently used: AND = P(A) ■ P(B) OR COMPATIBLE = P(A) + P(B) - P(A 0 B ) OR INCOMPATIBLE = P(A) + P{B ) XOR = P(A) + P(B ) - 2 P(A n B) (6.19) This is the "tree analysis error" or "probabilistic tree analysis" which is used to analyse the possible reasons for failure of a system of any kind (industrial, administrative or other). To close this part of the chapter consider the following figure displaying Venn diagrams (see section Set Theory) for all 16 events (including the impossible event) that can be described in terms of two given events A and B. In each case, the event is represented by the red area: Consider the situation where A represents and earthquake and B represents a major flood and U the universe of all dramatics events for a nuclear power plant. We consider that the two events are independents. Then each of the 16 events can be described as follows, either mathematically or verbally. 1. An earthquake can occur or a flood or nothing or the both together or any other event (to resume: any event can occur). P(U) = 1 = 100% (6.20) 2. A U B: Any event with an earthquake a flood or the both event together can occur. P(A U B) = P(A ) + P(B) - P(A n B) = P{A) + P(B) - P(A)P(B) (6.21) 3. AU B c : Any event with earthquake can occur with or without a flood excepted events with a flood not associated to an earthquake. P(A U B c ) = P(U) - P(B) + P(A n B) = 1 - P(B) + P(A)P(B) (6.22) 4. A c LI B: Any event with earthquake can occur with or without a flood excepted events with a flood not associated to an earthquake. P(A C U B) = P(U) - P(A ) + P(A nB) = l- P(A ) + P(A)P(B) (6.23) info @ sciences. ch 275/5785 4. Arithmetic EAME v3. 5-2013 CD, :o, r~i os. Figure 4.49 - Possible Venn diagrams for two events 5. A c U B c : Any event can occur excepted those associated with an earthquake together with a flood. P(A C U B c ) = P(U ) - P(A HB) = 1- P(A)P(B ) (6.24) 6. A: Any event with an earthquake can occur (this include the events associating an earth- quake and a flood). P(A) = P(A) (6.25) 7. B: Any event with a flood can occur (this include the events associating a flood and an earthquake). P(B) = P(B) (6.26) 8. (AnB) U ( A c nB c ): Any event can occur excepted those including an earthquake without a flood or those including a flood without an earthquake. P((A nB)u ( A c n B c )) = P(U) - P(A) - P(B ) + 2 P(A n B ) = 1 - P(A) - P(B) + 2 P(A)P(B) (6.27) 9. (AnB c ) U (A c UB): Any event including an earthquake without a flood or a flood without an earthquake can occur. P((A n B c ) U (A c U B)) = P(A) + P(B ) + 2 P(A D B) (6.28) 276/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 10. B c : Any event excepted those including a flood can occur. P(B C ) = P(U ) - P(B) = 1 - P(B) (6.29) 11. A c : Any event excepted those including an earthquake can occur. P(A C ) = P(U) - P(A ) = 1 - P(A) (6.30) 12. AD B: Any events associating an earthquake and a flood together can occur. P((AnB)) = P(A)P(B) (6.31) 13. An B c : Any event with an earthquake and without a flood can occur. P((AnB c )) = P(A) - P(A)P(B) (6.32) 14. A c n B: Any event with a flood and without an earthquake can occur. P(A c nB) = P(B) - P(A)P(B) (6.33) 15. A c D B c : Any event can occur excepted those including an earthquake and/or a flood. P(A C n B c ) = P(U) - P(A U B) = 1 - P(A) - P(B) - P(A)P(B ) (6.34) 16. A fl A c or B D B c \ Impossible Event. P(AnA c ) = P(BnB c ) = P( 0 ) =0 (6.35) 6.3 Conditional Probabilities What can we infer about the probability of an event B knowing that an event A has occurred, aware that there is a link between A and B! In other words, if there is a link between A and B, the completion of A has to change our understanding of B and we want to know if it is possible to define the conditional probability of an event (relatively) to another event. This type of probability is called "conditional probability" or "a posterior probability" of B knowing A, and is denoted in the context of the study of conditional probabilities: P(B/A) (6.36) and often in practice to avoid confusion with a possible division: P(B | A) (6.37) and we sometimes find in U.S. books the notation: P(B A A) (6.38) info @ sciences. ch 277/5785 4. Arithmetic EAME v3. 5-2013 or also: We also have the case: Pb(A) (6.39) P(A/B) (6.40) which is called "likelihood function of A" or "a priori probability of A given B" . Historically, the first mathematician to have used the correct notion of conditional probability was Thomas Bayes (1702-1761). This is why we often say "Bayes" or "Bayesian" probabilities as soon as conditional probabilities are involved: "Bayes formula", "Bayesian statistics", etc. The notion of conditional probability that we will introduce is much less simple than it first appears and the conditionals problems are an inexhaustible source of errors of any kind (there are famous paradoxes on the subject and even expert requires peer review to minimize mistakes). Let’s start with a simple example: Suppose we have two dice. Now imagine that we only launched the first die. We want to know what is the probability that by throwing the second dice, the sum of the two numbers is equal to a given minimum value. Thus, the probability of obtaining the minimum value given the value of the first die is totally different from the probability of obtaining the same minimum value in throwing two dice at the same time. How to calculate this new probability? Let us now formalize the process! After the launch of the first dice, we have: A = (the result of the first throw is...} (6.41) Under the hypothesis that B C A , we feel that P{B / A) must be proportional to P(B), the proportionality constant being determined by the normalization: P(A/A) = 1 (6.42) Now let B C A c (B is included in the complement of A so that the events are mutually exclu- sive). It is then relatively intuitive .... that under the previous hypothesis of incompatibility we have the conditional probability: P(B/A) = 0 (6.43) This leads us to the following definitions of respectively a posteriori and a priori probabilities: and Thus, the fact to know that A has occurred reduces all possible outcomes of the universe U of B. From there, only the events of type Af\ B are important. The probability of A given B or vice versa (by symmetry) must be proportional to P{A D B)\ The coefficient of proportionality is the denominator and it ensures the certain event. Indeed, if two events A and B are independent (think the black cat and the scale for example), then we have: P(B/A) = P(AnB) P(A) P(A/B) = P(A n B) P(B) (6.44) P(AnB) = P(A)P(B) (6.45) 278/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic and then we see that P(B/A ) is equal to P(B) and therefore the event A adds no information to B and vice versa! So in other words, if A and B are independent, we have: P(B/A) = P(B) and P(A/B) = P(A) (6.46) Another fairly intuitive way to see things is to represent the probability measure P as a measure of subsets areas (surface) of M 2 . Indeed, if A and B are two subsets of respective areas P(A) and P(B) then the question of what is the probability that a point in the plane belongs to B knowing that it belongs to A it is quite obvious that the probability is given by answer: Surface(A) flSurface(i?) Surface^) <6 ' 47) We would like to indicate that the definition of conditional probability is often used in the following way: P(A n B) = P(A/B)P(B ) = P(B/A)P(A ) (6.48) call "formula of compound probabilities". Thus, the a posteriori probability of B knowing A can also be written as: P(B/A) = P(A/B)P(B) P(Aj (6.49) The way that tis formula gives an update of the probability hypothesis, B, in light of some body of data, A, is named the "diachronic interpretation". "Diachronic" meaning that something is hapenning over time, in this case the probability of the hypothesis changes, over time, as we see new data. In this interpretation the different terms have a name: • p(B) is the probability of the hypothesis before we see the data, name as we already know, the "prior probability", or just "prior". • p(B/A ) is what we want to compute, the probability of the hypothesis after we see the data, named as we already know, the "posterior". • p(A/B ) is the probability of the data under the hypothesis, named the "likelihood". • p(A) is the probability of the data under my hypothesis, name the "normalizing constant". info @ sciences. ch 279/5785 4. Arithmetic EAME v3. 5-2013 ^Example: Suppose a disease like meningitis. The probability of having the meningitis will be denoted by P(M) = 0.001 (arbitrary value for the example) and a sign of this disease like headache will be noted P(S) = 0.1. We assume known that the a posteriori probability of having a headache if we have meningitis is: P(S/M) = 0.9 (6.50) The Bayes’ theorem then gives the a priori probability of having meningitis if we have a headache!: P(M/S) P(S/M)P(M ) W) 0.09 (6.51) We also note that: P(A) = P ((Si UB 2 U ... U B t ) n A) = P ((Si n A) u (b 2 n A) u ... u (5; n A)) = P (( B 1 n A) + (b 2 n A) + ... + (Bi n A)) = P(A/5 1 )P(5 1 ) + P(A/B 2 )P{B 2 ) + ... + P{A/B n )P{B n ) = sum n i=1 P(A/ B^P^) (6.52) So we can know the probability of the event A knowing the elementary probabilities P(Bi ) of its causes and the conditional probabilities of A for each B,; (6.53) which is called the "formula of total probabilities" or "total probabilities theorem". But also, for any j, we have the following corollary using the previous results that gives us following an event A, the probability that it is the cause Bi that produced it: P(Bj/A) = P(B s nA) _ P(A/Bj)P(Bj) P(A) Y2P(A/B,)P(B,) (6.54) which is the general form of the "Bayes formula" or "Bayes’ theorem" that we will us a little in the Statistical Mechanics chapter and through the study of the theory of queues (see section Quantitative Management). You should know that the implications of this theorem are, however, considerable in daily life, in medicine, in industry and in the field of Data Mining. We often find in the literature many examples of applications of the previous relation with only two possible outcomes B with respect to the event A. Therefore we find the Bayes formula written in the following form for each issue: pm P(A/P 1 )P(P 1 ) P(A/Bi)P(Bi) P(A/ Bi)P(Bi) + P(A/B 2 )P(B 2 ) P(A/B 1 )P(B 1 ) + P(A/P 1 )P(P 1 ) pm /M = P{A/B 2 )P{B 2 ) P(A/B 2 )P(B 2 ) [ 2/ P(A/ Bi)P(Bi) + P(A/ B 2 )P(B 2 ) P(A/B 2 )P(B 2 ) + P(A/B 2 )P(B 2 ) (6.55) 280/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic and note that in this particular case (binary outcomes): P{Bi/A) + P(B 2 /A) = P(Bi/A) + PiBt/A) P{A/ Bi)P{B 1 ) P(A/B 2 )P(B 2 ) P{A/ Bi)P(Bi) + P(A/ B)P(B) + P(A/B 2 )P(B 2 ) + P{A/B 2 )P{B 2 ) P{A/ Bi)P(Bi) P(A/B 2 )P(B 2 ) P{A/ Bi)P(Bi) + P(A/ B)P(B) + P(A/B 1 )P(B 1 ) + P(A/B 1 )P(B 1 ) ^ P{A/ Bi)P(Bi) + P{A/B 2 )P{B 2 ) P{A/ Bi)P(Bi) + P(A/ J B 1 )P(B 1 ) _ P(A/ Bj)P(Bi) + P(A/ Bj)P(Bi) _ P(A/ Bi)P(Bi) + P(A/ Bi)P(Bi) (6.56) is an intuitive result. For binary events, we also have (returning to the theorem of total probabilities seen above): n P(A) = Y.P(A/B,)P(B,) = P(A/B 1 )P(B 1 ) + P(A/B 2 )P(Bo ) = P(A/Bi)P(B 1 ) + P(A/Bi)P(Bi) info @ sciences. ch 281/5785 4. Arithmetic EAME v3. 5-2013 ^Examples: El. A disease affects 10 people on lO’OOO (0.1% = 0.001). A test has been de- veloped which has a 5% false positives (people not having the disease but for which the test says they are affected) but still always detects the disease if a person has it. What is the probability that a random person for which the test gives a positive result really has this disease? There is therefore 10,000 people, 500 of which are false positives, and we know a posteriori that 10 people have really the disease. Then the probability that somebody who has a positive test result is really sick is: P(M) P(M ) P(T/M) P(T/M) P(T ) P(M/T) 0.001 0.999 1 0.05 P(T/M)P(M ) + P(T/(M))P((M) = 0.05095 P(T/M)P(M ) W) 10.001 0.05095 0.19627 (6.58) This is often a shocking and counter-intuitive result. It also highlights why diagnostic tests must be extremely reliable! E2. Two machines Ml and M2 produce respectively 100 and 200 pieces. Ml produces 5% defective pieces and M2 produces 6% (posterior probabilities). What is the a priori probability that a defective piece was manufactured by the machine Ml? We then have: P(Mj/A) P{A/Mj)P{Mj) _ P{A/M 1 )P(M l ) £ P(A/M i )P(M i ) “ P(A/M!)P(M!) + P(A/M 2 )P(M 2 ) 5 300 1 100 100 5 100 6 200 (6.59) 100 300 100 300 3 3 E3. From a batch of 10 pieces with 30% defective, we take a sample of size 3 without replacement. What is the probability that the second piece is correct (whatever the first is)? We have: P(A) = ^P(A/B i )P(B i ) = '£P(A/B i )P(B l ) 2=1 2=1 = P(A/B l )P(B 1 ) + P(A/B 2 )P(B 2 ) = + 51 = 70% (6.60) 282/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic E4. We conclude with an important example for companies where employees have more time in their career to pass exams or assessments in the form of multiple choice questions (MCQs). If an employee responds to a question there are two issues: either he knows the answer or he try to guess it. Let p be the probability that the employee know the answer and therefore 1 — p that he guess it. We admit that the employee who guesses will correctly answer with a probability of 1/m where m is the number of proposed answers. What is the a prori probability that an employee (really) knows the answer to a question with 5 choices if he answered correctly? Let B and A be respectively the events "the employee knows the answer" and "the em- ployee correctly answers the question". Then the a priori probability that an employee knows (really) the answer to a question that he answered correctly is: P(B/A ) P(A/B)P{B) P(A/ B)P(B) + P(A/B)P(B) 1 • p 1 • p H (1 —p) 771 (6.61) Bayesian analysis provides also a powerful tool to formalize reasoning under uncertainty and the examples we have shown above illustrate how this tool can be difficult to use. 6.3.1 Conditional Expectation Now, we will see to the continuous version of the conditional probability by introducing the subject directly with a particular example (the general theory being indigestible) infinitely im- portant in the field of social statistics and quantitative finance. However, this choice (the study of a particular case) implies that the reader has read the first chapter of Statistics to study the functions of continuous distributions and especially that of the Pareto law. So here’s the scenario: Often, in social sciences or economics, we find in the literature dealing with the laws of Pareto statements like the following (but almost never with a detailed proof): whatever your income, the average income of those who have an income above yours is in a constant ratio, greater than 1, to your income if it follows a Pareto random variable. Then we say that the law is isomorphic to any truncated part itself. Let X be a random variable equal to the income and following a Pareto with the density (see section Statistics): Let’s see what it is exactly: /(*) = k ^h (6.62) with k > 1 , x rn > 0, x > x m and that has for distribution function (see also the Statistics chapter for the detailed proof): P(X < x) = 1 - (— Y (6.63) \ x ' info @ sciences. ch 283/5785 4. Arithmetic EAME v3. 5-2013 The sentence begins with "whatever your income then select any income xq > x rn . Now we need to compute "the average income of those with income higher than xq" . It is therefore asked to calculate the expected (average income) of a new random variable Y that is equal to X, but restricted to the population of people with an income above ;/; 0 : ^ - X \{X>xo) The distribution function of Y is given by: P(Y <x) = P(X<x \X> x 0 ) (6.64) (6.65) This expression is of course equal to zero if x < x 0 . Well, so far we have only do vocabulary. First recall the following conditional probability relation already seen before: P(B/A) P(A n B) P(A) (6.66) for x > xq we have for the conditional law: P (X < x | X > x 0 ) = P(X > x 0 ) (6.67) Before going further, you should be aware that the numerator and denominator are independent but that the whole must be considered, however, as the realization of a single random variable which we denote Y. Furthermore, only the numerator is a dependent variable. The denominator can it be considered as a normalization constant. So we see that the density of Y is given by the function: fr(y) = 0 f(y) P(X>x o) y <x o y>x o Now we can calculate the expectation of Y: +oo E(y)= [ yf Y (y)dy — I f(y) +oo ;dy = J P(X > X 0 ) ' P(X > X 0 ) J X 0 XQ +oo +oo 1 r r k kr k r 1 Fpr>W) / ^ / ? dy XQ XQ yf(y)dy kx, P(X > x 0 ) (k - l)^-i Knowing that (see section Statistics): +oo XQ kx~ P(X > Xq) (. k ~ 1)Xq k - 1 +oo P(X >x0)= / k rpk /™ \ k •Pm I X k + 1 XQ k X 0 Xq (6.68) (6.69) 284/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic We finally have: E (V') = ^ (6.70) E(Y) represents also the average income of those with an income above x 0 and as can be seen from the above equality it is in a constant ratio, greater than 1, to your income xq. We can check this result by doing a Monte Carlo simulation in a spreadsheet software (it is interesting to mention it to generalize to situations not computable by hand). You just need to simulate the inverse of the distribution function: (6.71) in Microsoft Excel 11.8346: = ($B$7~$B$6/(1 -RANDBETWEEN (1,1 0000)/ 1 0000))"( 1 /$B $6) and then take the average of the values obtained above or equal to a given X (which corresponds to x 0 ) and ensure that we get the good results as proved above! Obviously, we could also calculate the conditional variance (verbatim the conditional standard deviation). It may come one day... 6.3.2 Bayesian Networks Bayesian networks are simply a graphical representation of a problem of conditional probabil- ities to better visualize the interaction between the different variables when they begin to be in large numbers. This is a technique increasingly used in decision aided software (Data Mining), artificial intel- ligence (A.I.) and also in the analysis and risk management (ISO 31010 norm). Bayesian networks are by definition directed acyclic graphs (see section Graphs Theory), so that an event can not (even indirectly) influence its own probability, with quantitative description of dependencies between events. These graphs are used for both knowledge representation models and calculating conditional probabilities machines. They are mainly used for diagnosis (medical and industrial), risk anal- ysis (diagnostics failures, faults or accidents), spam detection (Bayesian filter), voice text and image opinions analysis, fraud detection or bad payers as well as data mining (M.K.M.: Mining and Knowledge Management) in general. info @ sciences. ch 285/5785 4. Arithmetic EAME v3. 5-2013 Remark Many systems and software based on drawings or on information in existing databases exists to build and analyse Bayesian networks. Paid solutions: SQL Server, Oracle, Hugin. Free solutions (at this date): Tanagra, Microsoft Belief Network MSBNX 1.4.2, RapidMiner. Personally I prefer the simplicity of the small software MSBNX from Mi- crosoft. For information, in 15 years of professional experience as a consultant I have met so far only one company on more than 800 multinationals which used Bayesian networks... (in transportation). V / Use a Bayesian network is assimilated to do "Bayesian inference". Based on observed informa- tion, we calculate the probability of possible known data but not observed. For a given domain (e.g. medical), we describe the causal relations between variables of interest by a graph (we do not need again to specify that it is acyclic). In this graph, the causal relations between variables are not deterministic, but probabilistic. Thus, the observation of a cause or multiple causes does not always implies the effect or effects that depend on it, but only changes the probability of observing them. The particular interest of Bayesian networks is to consider simultaneously a priori knowledge of experts (in the graph) and experience contained in the data. Example of 5 variables with relations (directed acyclic graph) and numbering of states/variables: 286/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Obviously, the construction of the causal graph is based primarily on return of experience (REX) and sometimes results on standards or reports of expert committees. In computing science, the causal graph automatically change depending on the content of databases (think at the Amazon book store in real time target advertisements based on your past purchases or at the Genius Apple service). But we can rarely think to all possibilities and there will sometimes hidden states between two known states that have been forgotten and that would have allowed to better modelize the situations. Suppose in the example above that with the help of a corporate database, we know that in about 100, 000 man-days, we hat in the company 1, 000 accidents (i.e.. 1% of total) and 100 machines failures (i.e.. 0.01%of total). Then we represent it in the traditional form as follows: P(A1 >=[ l%,99%] P(S2H0.1%,99.9%J where we have the subset 5 2, 54, 55 which is what experts name a "serial or linear connection", the triplet 53, 52, 54 is a called a "divergent relation" (if the arrows were reversed for the triplet, we would have a "converging relation"). Before going further with our example we will make some observations in relation to these three types of relations: For clarity, we distinguish first "conditional independence" and "conditional dependence". We say that events A and C are "conditionally independent" if given an event B the following info @ sciences. ch 287/5785 4. Arithmetic EAME v3. 5-2013 equality holds: P(A/B) = P(A/B,C) (6.72) So the term "conditional" implies the presence of B and the fact that C does not influence the probability of the event A. About "conditional dependence", this time we can distinguish three types of relations. 1. The conditional dependence of the following type is called a "serial or linear connection" (already mentioned above): where A, B and C are dependent (in this particular example there are 3 dependent nodes A, B and C, but in general this dependence relates to all nodes if there were more than 3) In addition, A and C are conditionally dependent to B. But if the variable B is known, A no longer provides any useful information about C (the path of uncertainty is somehow broken) and therefore A and C become conditionally independent. We then have the conditional probability that simplify as follows: P(C/B, A) = P(C/B ) (6.73) 2. The conditional dependence of the following type is called a "divergent connection" (as already mentioned above): Figure 4.53 - Divergent Bayesian network 288/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic In addition, B and C are conditionally dependent on A. But if A is known, B does not provide any more information on C (again the path of uncertainty is somehow broken) and therefore B and C become independent. We then have for example if A is known: P(C/A, B ) = P{C/A) (6.74) 3. The conditional dependence of the following type is called a "convergence connection" or "E -Structure" (as already mentioned above): where this time the parents are independent. So B and C are independent but become conditionally dependent on A. If A is known, then we have: P(A/B,C) = P(A/B) (6.75) The dependence between parents therefore requires the observation of their common child. Now to make a concrete example, suppose our database gives us (thanks to quality managers who always inputs the quality issues) that when a machine failure occurred, 99 times out of 100 (99%) there has been a total production stop (i.e. 1% of time there was no production stop) and on all stop production 1% was not due to a machine failure. What we traditionally represent as follows: info @ sciences. ch 289/5785 4. Arithmetic EAME v3. 5-2013 P(.S'1)=[1%.99%] P(52)=[0.1%,99.9%] So the "implicit probability" that there is a production stop is given by: P{S4) = P(S4/S2)P(S2) + P(S4/S2)P(S2) = 99% • 0.1% + 1% ■ 99.9% = 1.098% (6.76) This value represents the implicit proportion of productions stop from the 100,000 man-days (so we can give a proportion of rows in the database that represents a production stop regardless of the cause and even without knowing the details of the database).. It then follows immediately that the implicit probability that there is no production stop is given by: P(S4) = 1 - P(S4) = 98.902% (6.77) This is consistent with what gives us the freeware MSBNX 1.4.2: 290/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Now suppose we observed a production stop. What is the a posteriori probability that it is due to a machine failure? We then have: P(m , c.., P(Si/S2)P(S2) 99%. 0.1% P(S2/S4) ^ — pm — ■ 1.098% 9 - 02% <6J8) We can also check this with the software MSBNX 1.4.2: Figure 4.57 - A Posteriori probability of a production stop due to a machine failure in MSBNX 1.4.2 Now, imagine that our database gives us (always thanks to quality managers who ensured to info @ sciences. ch 291/5785 4. Arithmetic EAME v3. 5-2013 input quality issues) that 99 times out of 100 (99%) when there was a production stop, there was an evacuation. However 5% of evacuations were identified as having nothing to do with a production stop (i.e. 95% of evacuations are due to fire exercises OR other events): P(51)=[l%.99%] P(S2)=[0. 1 %.99.9%] Figure 4.58 - 2nd level Bayesian network Now to calculate the implicit probability retrospectively (a posteriori) of evacuations compared to machines failures, we saw that when we had a conditional dependence serie, the conditional probability depends only on the direct parent. Thus, we get: P(S 5) = P(S5/S4)P(S4) + P(S5/S4)P(S4) = 99% • 1.089% + 5% • 98.092% ^ 6.03% (6.79) We can also check this with the software MSBNX 1.4.2: 292/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic s>* Microsoft Be et NerwOfki Edging Mode .MSBNX.EN - [Belief Network: Model_MSBNX.ES *1*1*181*1 tj MachineFailure (Machi irl«Failure) 21 C ^PrcKi uction Stop^ ^ (Produc^onStop) c A. Evacuation (Evacuation) a ffc-4iS?i$G!Sii Figure 4.59 - Implicit probability of an evacuation in MSBNX 1.4.2 So the implicit probability of evacuation does not actually depend on machine failures. Now suppose we have observed an evacuation. We want to know what is the a posteriori prob- ability that it is due to a machine failure ! Then we have: TOO.,0,) P(S5/S4)P(S4) 99% ■ 1.098% ^ ^ P(S4/S5) ^ P(S5) ^ 6.03% 18 '° 2% <6 ' 80) We can also check this with the software MSBNX 1.4.2: £4, Microsoft Be ef Networks: Eva -ating 'Mod e l_ M SBNX_ E N - [Evaluation: 1] Q- File View Window Help D|a?| Efflz | b|e|^| Jpj B \JTZ Model_MSB NX_EN 0* I Spreadsheet | Bar Chart | Recommendations 0O Machi 0 o Produ Unobserved v v 'fe No Node Name State 0 State 1 Evacuation Yes No 1.0000 0.0000 Machine Failure Yes No 0.0163 0.9837 Production Stop Yes No 0.1802 0.8198| Figure 4.60 - A Posteriori probability of an evacuation due to a machine failure in MSBNX 1.4.2 Now we study the case with the alarm and again a database allows us to build a table with different probabilities: info @ sciences. ch 293/5785 4. Arithmetic EAME v3. 5-2013 P(.S'l )=l 1 %.99%] P(52)=[0. 1 %.99.9%] Machine failure 52 P(53/51,52) Accident du travail. Panne machine Yes. Yes Yes. No No, Yes No. No Yes 75% 10% 99% 10% o Z 25% 90% 1% 90% P(54/S2) Production stop □ Machine failure Yes No Vi > 99% 1% O Z 1% 99% P(55/54) > w Production stop Yes No Yes 99% 5% Z 1% 95% Figure 4.61 - 2nd level Bayesian network with second branch Now to calculate the implicit probability that there is an alarm, we will have to consider the four possible situations. We then use the theorem of total probability: P(S 3) = P(S3/S1, S2)P(S1)P(S2) + P(S3/S1, S2)P(S1)P(S2) + P(S3/S1,S2)P(S1)P(S2) + P(S3/S1, S2)P(ST)P(S2) (6 ' 81) What a little more rigorously should be written: P(S 3) = P(S3/S1 A S2)P(S1)P(S2) + P(S3/S1 A S2)P(S1)P(S2) + P(S3/S1 A S2)P(S1)P(S2) + P(S3/S1 A S2)P(S1)P(S2) The numerical application therefore provides for the implied probability of an alarm: P(S3) = 75% • 1% • 0.1% + 99% • 99% • 0.1% + 10% • 1% • 99.9% + 10% • 99% ■ 99.9% ^ 10.089% (6.82) (6.83) What can be built and check as follows with MSBNX 1.4.2: 294/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Microsoft Be e< Networks Editing 'Ntedt .MSBNX.EN • '.6< c( Network; MoOel.MS6MX.EN] M f' Window Help Dltfl Eg|z|*fgfel tl c WorkAccident (Work Accident) MachineFailure''^} (MachipiFailure) Alarm C ^ProductbnStopJ ^ (Alarm) *'■«*** ( (Production Stop) ijsesjrr-eft (Modek Model.MS8NX.EN, Node Aiami Perem llodere) M*cMnef*iknc Yer No WorkAccklent Yes No Yes No Alim Ho 075 099 01 01 025 001 09 09 Figure 4.62 - Implicit probability in MSBNX 1.4.2 It may be useful to the reader to know that he can sometimes found in the literature the following notation: P(53/51 = Yes, 52 = Yes) = P(53/51, 5 2) = P(53/51 A 52) (6.84) Remark In the particular example studied above all event have only two states. But in practice they can have 3, 4 and more states. Therefore probabilities cross-tables quickly become enormous. As in the previous case, suppose we know that there was a working accident. We wish then calculate the a priori probability of an alarm. We then have (observe that the probability depends actually only to the state 5 2 state since the state 51 is completely known!): P(53/51) = P(53/51,52)P(52) + P(53/51,52)P(52) = 75% • 0.1% + 10% ■ 99.9% = 10.065% (6 ' 85) We can also check this with the software MSBNX 1.4.2: info @ sciences. ch 295/5785 4. Arithmetic EAME v3. 5-2013 Microsoft Bel ef Networks: Evaluating 'Model_MSBNX_EN - [Evaluation: 9] Q • File View Window Help D Hf 1 X % E %| f 1 El Model_MSBNX_EN Spreadsheet | BarChart | Recommendations | 0 o Alarm Qo Evacuation 0 O MachineFailure Qo Productions top WorkAccident = Yes Figure 4.63 - Implicit probability of an alarm in MSBNX 1.4.2 Node Name State 0 State 1 Alarm Yes No 0.1007 0.8994 So, knowing that there was a work accident increases the probability that there is an alarm (we start from a probability of 10.089% to go to a probability of 10.65%). To complete this example, we would calculate the a posteriori probabilities P(S2/S3 ) and P(S1/S3). To do this, we must first calculate the a priori probabilities P(S3/S2 ) and P(S3/S1) (this last one has been calculated just before). We have for the missing value (which can be easily checked as before with MSNBX 1.4.2 software): P(S3/S2) = P(S3/S2, Sl)P(Sl) + P(S3/S2, Sl)P(Sl) = 75% • 1% + 99% • 99% ^ 98.76% We then have: P(S3/S1) = 10.065% P(S3/S2) = 98.76% (6.86) (6.87) We now have everything we need to calculate the a priori probability of P(S2/S3 ) and P(S1/S3): P(P3/P2)P(P2) ( ' b P(S3) p(si/S3) = rwjypjsi) 98.76% ■ 0.1% 10.089% 10.065% • 1% 10.089% = 0.9789% = 0.9976% (6.88) So the a priori probability that there is a machine breakdown when we know that there is an alarm is 0.979% (i.e. 0.021% that the trigger of the alarm is not a priori due to a machine failure). Respectively there is, a priori, 0.998% probability that there is a work accident when we know there is an alarm (and then 0.002% that it is not a priori due to a work accident). From the critical point of view, when there is finally an alarm we can not say a lot of things.... This is because, in this case, to the fact that the events of significant interest both have low probability to occur (work accident and machine failure) and that the employees respond quite well at the start of the alarm (otherwise if the a priori probabilities were high it would mean that the behavior of the employees is not good because we can guess - with exasperation - in advance which problem occurs with a good confidence). 296/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic To conclude, the reader will have noticed that the calculations can quickly become annoying when the graph becomes complex and this explains the use of computer software. Furthermore, in the banking sector that uses for example Bayesian networks for credit risk, the a priori prob- ability can be more complex. For example we might want to know the a priori probability that there is a machine failure knowing that we have an alarm and an accident: P(S2/S3, SI) = P(S2/S3 = Yes, SI = Yes) (6.89) 6.4 Martingales A martingale in probabilities (there is another one in stochastic processes) is a technique to increase the chances of winning in gambling while respecting the rules of the game. The princi- ple is completely dependent on the type of game we are focusing, but this term is accompanied by an aura of mystery that some players would know efficient secret techniques to cheat with chance. For example, many players (or candidates to play) search THE martingale that will beat the bank in the most common games in casinos (institutions whose profitability relies almost entirely on the difference - however small - between the chances of winning and losing). Many martingales are the dream of their author, some are actually inapplicable, some could actually give the possibility to cheat a little. Gambling in general are unfair: whatever the shot played, the probability of winning of the casino (or of the State in the case of a lottery) is greater than this of the player. In this type of game, it is not possible to inverse the chances, just to minimize the probability of gambler’s ruin. The most common example is the roulette wheels martingale. It consists to play a single chance to the roulette wheels (red or black, odd or even) to win, for example, a unit in a series of moves by doubling his bet if we lose, and that until we earn. Example: the player bets 1 unit on red, if red comes out, it stops playing and won 1 unit (2 units less gain setting unit), if black comes out, he doubles his betting by 2 units on red and so on until he wins. info @ sciences. ch 297/5785 4. Arithmetic EAME v3. 5-2013 Figure 4.64 - Casino roulette wheel Having a chance on two to win, he may think he will eventually win, and when he wins, he is necessarily paid for everything he has played more one unit of his initial bet. This martingale appears to be safe in practice. Note that in theory, to be sure of winning, we should have the opportunity to play an unlimited number of times.... This has major drawbacks: This martingale is in fact limited by the bets that the player can do because you have to double the bet every time you lose: 2 times the initial bet, then 4, 8, 16 .... if he loses 10 times, he must be able to bet 1024 times its initial investment for the 1 1th party! Therefore a lot of money for little gain! The roulette wheels also have a "0" which is neither red nor black. The risk of losing at every shot is is then larger than 1/2... In addition, to paralyze this strategy, casinos offer table games per set: from 1 to 100.-, from 2 to 200.-, from 5 to 500.-, ... Therefore it is impossible to use this method on a large number of shots, which increases the risk of losing it all. Blackjack is a game that has winning strategies: several playing techniques, which usually require to memorize the cards, can overturn the chances in favour of the player. The mathe- matician Edward Thorp has published in 1962 a book that was at the time a real best-seller. But all these methods require long training weeks and are easily detectable by the croupier (sudden changes in the amounts of bets are typical). The casino has then the opportunity to banish from its establishment the players using this playing martingale. It should be noted that there are enough advanced methods. One of them is based on the less 298/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic played combinations. In games where the gain depends on the number of winning players (Lotto...), playing the least played combinations maximize gains. This is how some people sell combinations that would be statistically very rarely used by other players. Based on this reasoning, we can still conclude that a player who would have been able to determine statistically the least played combinations, to maximize its expected payoff, will in fact certainly not be the only player to have achieved this by the analysis of these famous combinations! This means that in theory the numbers the least played are actually overplayed combinations, the best might be to achieve a mix of played numbers and overplayed numbers to play for the ideal combinations Another conclusion to all this is maybe that the best is to play random combinations which ultimately are less likely to be chosen by the players who incorporate a human and harmonious factor in the choice of their numbers. 6.5 Combinatorial Analysis "Combinatorial analysis" (counting techniques) is the field of mathematics that deals with the study of all the issues, events or facts (distinguishable or indistinguishable) with their arrange- ments (combinations) ordered or not according to some given constraints. Definitions (#73): Dl. A sequence of objects (events, issues, objects, ...) is said "ordered" if each suite with a particular order of objects is recognized as a particular configuration. D2. A sequel is "unordered" if and only if we are interested in the frequency of appearance of objects regardless of their order. D3. The objects (of a sequence) are said "distincts" if their characteristics can not to be con- fused with the other objects. Remark We chose to put combinatorial analysis in this chapter because when we calculate prob- abilities, we also often need to know what is the probability of finding a combination or arrangement of given events under certain constraints. V 1 / Students often have difficulty remembering the difference between a permutation, an arrange- ment and a combination. Here is a little summary of what we’ll see: • Permutation: We take all the objects. • Arrangement: We choose objects from the original set and the order intervenes. • Combination: Same as for the arrangement, but the order does not interfere. You must not forget that for each result, the reverse will give the probability of falling respec- tively on a given permutation/arrangement/combination! info @ sciences. ch 299/5785 4. Arithmetic EAME v3. 5-2013 We will present and demonstrate below the 6 most common cases from which we can find (usually) all others: 6.5.1 Simple Arrangements with Repetition Definition (#74): A "simple arrangement with repetition" is an ordered sequence of length m of n distinct objects not necessarily all different in the sequence (either: with possible repetitions!). Let A and B be two finite sets of respective cardinal m, n such that there is trivially m ways to choose an object in A (of type a ) and n ways to choose an object in B (of type b). We saw in the section Set Theory that if A and B are disjoint, that: Card(A U B) = Card(A) + Card(-B) = m + n (6.90) We therefore deduce the following properties: PI. If an object can not be at the same time of type a and type b and if there is m ways to select an object of type a and n ways to choose an object of type b, then the union of objects gives m + n selections (this is typically the result of the SQL UNION queries without filters in corporate Relational Databases Management System). P2. If we can choose an object type of type a in m ways then an object of type b in n ways, then there is according to the Cartesian product of two sets (see section Set Theory) Card(A x B) = Card(A) • Card(-B) = m ■ n (6.91) ways to choose a single object of type a then an object of type b. With the same notation for m and n, we can choose for each element of A, its single image among the n elements of B. So there are n ways to choose the image of the first element of A, then also n ways to choose the image of the second element of A, ..., and n ways to choose the image of the m-th element of A . The total number of consecutive possible applications from A to B is thus equal to the m product of n (thus m times the cartesian product of the cardinality of the set B with itself!). It is usual to write it under the following way (we have indicated the different ways to write his result as it can be found in various textbooks): Card (B a ) = Card [ B x B x ... x B = Card(5) m = A™ = n r ' m times (6.92) where B A is the set of applications from A to B. The increase in the number of possibilities is geometric (not "exponential" as it is often wrongly said!). This result is mathematically similar to the ordered result (an arrangement where the order of elements in the sequence is taken into account) of m trials in a bag containing n different balls with replacement after each trial. In France this result is traditionally named a "p-list". 300/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic ^Examples: El. How many (ordered) "words" of 7 letters can we form from a separate alpha- bet of 24 letters (very useful to know the number of trials to find a password for example)? The solution is: AI a = 24 7 = 4, 586, 471, 424 (6.93) E2. How many groups of people will we have in a referendum on 5 subjects and where each can be either accepted or rejected? The solution is (widely used in some Swiss companies): Al = 2 5 = 32 (6.94) A simple generalization of this result can consist of the following problem statement: If we have m such objects ki, h 2 , ..., k m as k r may take n, different values then the number of possible combinations is: And if rii = n ,2 = ... = n m we have equation then we fall back on: A-mn Tl\Tl2---Tli...Tlp l Tl (6.95) (6.96) 6.5.2 Simple Permutations without Repetitions Definition (#75): A "simple permutation without repetition" (formerly called "substitution") of n distinct objects is an ordered (different) sequence of these n objects all different by definition in the sequence (without repetition). Remark Be careful not to confuse the concept of permutation (n elements between them) and this of arrangement (of n elements among m)\ \ I / The number of permutations of n items can be calculated by induction: there are n places for a first element, n — 1 for a second element, ..., and there will be only one place for the last remaining element. It is therefore trivial that we the number of permutations is given by: n(n — l)(n — 2 ){n — 3)...(n — (n — 1)) (6.97) Recall that the product: n n(n — 1 )(n — 2 )(n — 3 )...(n — (n — T)) = JJ i (6.98) 2=1 info @ sciences. ch 301/5785 4. Arithmetic EAME v3. 5-2013 is called de "factorial of n" and we note it n\ for n G N. There is therefore for n distinguishable elements: A n = \[i = n\ ,=1 (6.99) as possible permutations. This type of calculation can be useful for example in project manage- ment (calculation of the number of different ways to get in a production line n different parts ordered from external suppliers). ^Example: How many (ordered) "words" of 7 different letters without repetition can we cre- ate? A n = 7! = 5040 (6.100) This result leads us to assimilate it to the ordered results (an arrangement A n equation in which the order of elements in the sequence is taken into account) of the trial of balls that are all different from a bag containing n distinguishable balls without replacement. 6.5.3 Simple Permutations with Repetitions Definition (#76): A "simple permutation with repetition" is when we consider the number of ordered permutations (different) of a sequence of n distinct objects not necessarily all different in a given quantity. When some elements are not all distinguishable in a sequence of objects (they are repeated in the sequence), then the number of permutations that we can be do are then trivially reduced to a smaller number then if all the elements were all distinguishable. Consider ra* as the number of objects of the type i, with: n\ + n 2 + ... + n k = n (6.101) then we write: A n (ni, ...,n k ) (6.102) the number of possible permutations (yet unknown) with repetition (one or more elements in a sequence of repetitive elements are not distinguishable by permutation). 302/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic If each of the n t positions occupied by identical elements were occupied by different elements, the number of permutations could then have to be multiplied by each of the n t ! (previous case). A n (ni,n 2 , ...,n k )n 1 \n 2 \...n k \ = n\ then we deduce: A n (n 1 ,n 2 ,-.;n k ) n 1 \n 2 \...n k \ If the n objects are all different in the sequence, we then have: ri \ ! = n 2 \ = ... = n k \ = 1! = 1 (6.103) (6.104) (6.105) and we fall back again on a simple permutation (without repetition) as: A n (m,n 2 , ...,n k ) = n\ n\ ni\n 2 \...n k \ l!l!...l! = ni (6.106) ^Example: How many (ordered) "words" can we create with the letters of the word "Missis- sippi": 11 ! A ‘A’AAA> = l!2!4!4! =34 ’ 650 (6 '‘° 7) M PP IIII SSSS This result leads us to assimilate it to an ordered result (a permutation A n where the order of elements in the sequence is not taken into account) of the trial of balls that are not all different from a bag containing k > n balls with limited replacement for each ball. 6.5.4 Simple Arrangements with Repetitions Definition (#77): A "simple arrangement without repetition" is an ordered sequence of p objects all distinct taken from n distinct objects with n > p. We now propose to enumerate the possible arrangements of n objects among p without repeti- tion. We denote A £ the number of these arrangements. It is easy to calculate that A' n = n and to check that = n(n — 1). Indeed, there are n ways to choose the first object and {n — 1) ways to choose the second when we already have the first. To determine a nice expression for , we reason by induction. We assume equation known and we deduce that: K = A V [n-(p~ 1)] = Ai~\ n -p + 1) (6.108) info @ sciences. ch 303/5785 4. Arithmetic EAME v3. 5-2013 It comes: A p n = n(n — 1 )(n — 2 )(n — 3 )...(n — (p — 1)) (6.109) then: n! = A^(n — p)\ = [n{n — l)(n — 2)...(n — p + 2)(n — p + 1)] (n — p)(n — p — 1)... (6.110) whence: (6.111) This result leads us to assimilate it to the ordered results (an arrangement A p in which the order of elements in the sequence is taken into account) of the trial of p distinct balls from a bag containing n different balls without replacement. ^Example: Consider the 24 letters of the alphabet, how many (ordered) "words" of 7 distinct letters can we create? 24 ! A 7 24 = (24 _' , = 1,744,364,160 (6.112) The reader may have noticed that if p = n we end up with: 77 f A p n = - = n\ (6.113) So we can say that a simple permutation of n elements without repetition is like a simple ar- rangement without repetition when n = p. 6.5.5 Simple Combinations without Repetitions Definition (#78): A "simple combination without repetitions" or "choice function" is an non- ordered sequence (where the order doesn’t interest us!) of p elements all different (not necessar- ily in the visual sense of the word!) selected from n distinct objects and is by definition denoted Cp in this book and named the "binomial" or "binomial coefficient". If we permute the elements of each simple arrangement of elements p of n, we get all simple permutations and we know that there are in a number of p\, using the notation convention of this book we then have (contrary to that recommended one by ISO 31-11 !): r K _ n\ p\ p\(n — p)\ (6.114) 304/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic It is a relation often used in gambling but also in the industry trough the hypergeometric dis- tribution (see section Quantitative Management as well as quite high level statistics like order statistics (see chapter Statistics). A simple way to remember this function is the following trick: Consider we must select p among n independently of the ordery what are the number of possibilities? We know that we have 6-5-4 = 120 to select them all taking into account the order! The calculation we just made is obviously equal to n\/p\ = 6!/3! = 6-5-4. But as the order must not be taken into account we must divide the 120 by the number of ways we can arrange the 3 people in the group. So we divide 120 by 3! or more generally and logically by (n — p)\. Hence the relation above! This result leads us to assimilate it to the unordered result (an arrangement C% in which the order of elements of the sequence is not taken into account) of the trial of p balls of a bag containing n different balls without replacement. ^Example: El. Consider the 24 letters of the alphabet, how many choices do we have to take 7 letters in the 24 without taking into account the trial order? Cf 24! 7!(24 — 7)! 346, 104 (6.115) The same value can be obtained with the function COMB IN ( ) of Microsoft Excel 1 1.8346 (English version). E2. In a Design of Experiment (see section Industrial Engineering) we have 2 factors of L = 3 levels each and therefore we need N = 9 runs to completely determine all the interactions. If we consider that we can take a subset of S' = 3 runs, how many combinations of 3 among the 9 can we choose if repititions are vorbidden? Q1 c 9 = 3 3! (9 — 3)! = 84 (6.116) We understand therefore why in Design of Experiments it is important to found a trick to choose the best subset (D-optimum designs) There is, in relation to the binomial coefficient, another relation very often used in many case studies and also more globally in physics or functional analysis. This is the "Pascal’s Formula": info @ sciences. ch 305/5785 4. Arithmetic EAME v3. 5-2013 Proof 4.27.2. [n — 1)! + (n — 1)! s~rn— 1 s~m— 1 P_1 p “ ((n-l)~(p-l))!(p-l)! ' (n-l-p)!p! (n — 1)! (n - 1)! (6.117) (n — p)!(p — 1)! (n — 1 — p)!p! We also have p\ = p(p — 1)!, then: (P- 1)! = - p and because (n — p) (n — p — 1)! = [n — p)\: (6.118) (n — p — 1)! = (n — p)! n — p (6.119) Then: C 'n—l I s~rn—l p - 1 Tk — (n — 1)! (n — 1)! (n — l)!p (n — l)!(n — p) (n — p)!(p — 1)! (?z — 1 — p) !p! (n — p)!p! (n — p)\p\ [p + (ra - p)} = {n — 1)! (n — l)\n n\ (■ n — p)!p! (jn [n — p)!p! (n — p)!p! p (6.120) □ Q.E.D. 6.5.6 Simple Combinations with Repetitions Definition (#79): A "simple combination with repetition" of p elements of n is a collection of p non-ordered elements, not necessarily distinct. Simple combinations with repetition are very important for the Wald-Wolfowitz statistical test used in economics and biology and that we will study in the Statistics section. We will Introduce this kind of combination directly with an example an ingenious approach that we have thanks to the physicist and 1938 Nobel Prize in Physics: Enrico Fermi. Consider {a, b , c, d, e, /} a set having a number n of elements equal to 6 and where we draw a number of elements p equal to 8. We would like to calculate the number of combinations with repetition of elements in a starting set of cardinal 6 in a destination set of cardinal 8. Consider, for example, the following three combinations: aabbbeef (6.121) bbdddeee (6.122) bbbddddd (6.123) where as the order of elements does not occur, we have grouped the elements to facilitate the reading. Now represent all the above elements by the same symbol "0" and separate the groups 306/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic consisting of a single element by bars (this is the trick Enrico Fermi). Thus, when one or more elements are not included in a combination, we still denote the separation bars (corresponding to the number of missing elements + the separation of group). Thus, the three combinations above can be written as: 00 I 000 III 00 I 0 (6.124) | 0 0 || 000 | 000 | (6.125) | 0 0 0 0 II 0000 II (6.126) We see above that in each case, there are eight "0" (logic. ..) but also that there are also always five "|". The number of combinations with repetitions of six elements of a starting set to the final one of 8 elements is equal to the number of permutations with repetitions of 8 + 5 = 13 elements, so: 13! 5!8! (6.127) We also see that in the general case the number of combinations without consideration of repe- titions order can also be written: (n + p — 1)! (n — l)!p! It is traditional to write this: (n + p- 1)! p (n-l)!p! We also see that: r n = (n + p- 1)! = (n + p- 1)! p (n — l)!p! ((n + p — 1) — p)\p\ (jn-\-p— 1 Then: pn = C n+p - 1 = (n + p- 1)! p p (n-l)\p\ That we also sometimes write: T n P (-jn+p—1 ^ n+p - 1 L 'n- 1 (n + p — 1)! (n — 1 )!p! (6.128) (6.129) (6.130) (6.131) (6.132) To resume: info @ sciences. ch 307/5785 4. Arithmetic EAME v3. 5-2013 Type Expression Simple arrangement with repetitions A™ = n m Simple arrangement without repetitions An _ ^ m ( n-m)\ Simple permutation without repetitions A n = nl Simple permutation with repetitions Tl\ A n (n 1 ,n 2 , ...,n k ) = , , . ri\ m-d... 77.fr! Simple combination without repetitions (case of the simple arrangement without repetitions where the order is not taken into account) C n M K n! \P J ml 7771(77 — 777)! Simple combination with repetitions (case of the simple permutation with repetition where the order is not taken into account) r n C n+v- 1 0 n+p-l)\ p p (n -l)\p\ Table 4.15 - Resume of main Combinatorial Analysis cases 'You're better N in theory than In practice , 6.6 Markov Chains Markov chains are simple but powerful probabilistic and statistical tools but for which the choice of the mathematical presentation can sometimes be a nightmare... We will try here to simplify a maximum the writings to introduce this great tool widely used in businesses to man- age supply chain, in queuing theore for call centers or cash desk, in failure theory for preventive maintenance, statistical physics and biological engineering and also in time series analysis and forecasting (and the list goes on and for more details the reader should refer to the relevant chapters available in this book...). Definitions (#80): Dl. We note by {X(f)} tgT a probabilistic process function of time whose value at any time depends on the outcome of a random experiment. Thus, at each time t. X(t) is a random 308/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic variable that we name "stochastic process" (for more details on financial applications see the chapter Economy). D2. If we consider a discrete time, we then note "discrete time stochastic process" as i^/neN- D3. If we further assume that the random variables X n can take only a discrete set of values we then speak of "process in discrete time and discrete space". Remark It is quite possible as in the study of communications flows (see section Quantitative Management) of having a process in continuous time and discrete state space. V / Definition (#81): (2f n } ngN is a "Markov chain" if and only if: P(X n j | X n — 1 in— 2) X n — 2 in— 2i •••■> -^-0 ^o) P[X n j \ P n 1 in— l) (6.133) in other words (it is very easy!) the probability that the chain is in a certain state on the n-th step of the process depends only on the state of the process at step n — 1 and not on any previous steps! Remark Also in probabilities a stochastic process verifies the Markov property if and only if the conditional probability distribution of future states, given the present moment, depends only on the present state and not even past states as the relation above. A process with this property is also called a "Markov process". V / Definition (#82): A "homogeneous Markov chain" is a chain such that the probability that it has to go in a certain state at the n-th stage is independent of time. In other words, the probability distribution characterizing the next step does not depend on time (of the previous step), at all times the probability distribution of the chain is always the same for characterizing the transition to the current step. We can then define (reduce) the law of "probability transition" of a state i to state j by: Pij = P(X n = j | X n _i = i) (6.134) It is then natural to define the "transition matrix" or "stochastic matrix": Pn P 12 • ■ Pin P2\ P22 ■ ■ P2n _Pml Pm2 ■ Pmn (6.135) as the matrix that contains all possible transition probabilities between states in an oriented graph. info @ sciences. ch 309/5785 4. Arithmetic EAME v3. 5-2013 Markov chains can be represented graphically as an oriented graph G (see section Graph The- ory) sometimes named "automate" having for the top points (states) i and for the edges the oriented couples We then associate to each component an oriented arc and a transition probability. The reader has seen in the previous example that we have the trivial property (by construction!) that the sum of the terms (probabilities) of a row of the matrix P is always unitary (and therefore the sum of the terms of a column of the transpose of the matrix unit is still equal to the unit too): EPv = 1 (6.137) 3 and that the matrix is positive (meaning that all its terms are non-negative). 310/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Remark Remember that the sum of the probabilities of the columns is always equal to 1 for the transpose of the stochastic matrix! V J The analysis of transient state (or: random walk) of a Markov chain consist to determine (or to impose!) to the column-matrix (vector) p(n) to be in a given state at n-th step of the walk: (6.138) with the sum of the components that is always equal 1 (since the sum of the probabilities of being in any of the vertices of the graph at given a time/step must be equal to 100%). We frequently name this column matrix "stochastic vector" or "probability measure on the ver- tex Theorem 4.28. We want to prove that the total probability of this stochastic vector is always equal to 1. Proof 4.28.1. If p(n) is a stochastic vector, then its image: p(n + 1) = P T p(n ) (6.139) is also a stochastic vector. Effectively, pi(n + 1) > 0 because: Pi(n+ 1) = J2PnPj( n ) (6.140) i is a sum of positive or zero values. Furthermore, we find: Y J Pi{n + l) = EEmW = Y,Y,PhpM) = Y, ( Hph ) Pj( n ) i 3 3 * j \ i / (6.141) = E 1 -pA n ) = Y,pM) = 1 3 3 □ Q.E.D. This probability vector whose components are positive or zero, depends (it’s pretty intuitive) on the transition matrix P and the vector of initial probabilities p(0). Although it is provable (Perron-Frobenius theorem), the reader may verify by a practical case (computerized or not!) that if we choose any vector state p(n) then there exists for any stochastic matrix P a unique probability vector traditionally noted 7 r as: P T 7T = 77 (6.142) Such a probability measure tt satisfying the above equation is called an "invariant measure" or "stationary measure" or "balance measure" which represents the equilibrium state of the system. info @ sciences. ch 311/5785 4. Arithmetic EAME v3. 5-2013 In terms of linear algebra (see section of the same name) for the eigenvalue 1, is an eigenvector of P (see section Linear Algebra). We will see a trivial example in the Graph Theory section which will be redeveloped in detailed as in the section of Game and Decision Theory in the context of pharmaco-economics and in the section of Software Engineering when we will study the fundamentals of the Google Page Rank algorithm. But also note that the Markov chains are used for example in meteorology (or in the case of computer passwords hacks): or in medicine, finance, transportation, marketing, etc. In the field of language analysis, from the frequency analysis of a sequence of words, computers are able to also build Markov chains and therefore propose a more correct semantic during grammatical computerized corrections or in written transcription of oral presentations. Definitions (#83): Dl. A Markov chain is said to be an "irreducible Markov chain" if all states are bound to others (it’s the case of the example in the figure above). D2. A Markov chain is said to be an "absorbing Markov chain" if one of the states of the chain absorbs the transitions (so nothing comes out just to say things in a more simple way!). ☆☆☆☆☆ 27 votes, 51.11% 312/5785 info @ sciences, ch Statistics S Tatistics is a science that concerns the systematic grouping of facts or recurring events that lend themselves to a numerical or qualitative assessment over time according to a given law. In the industry and the economy in general, statistics is a science that helps in an uncertain environment to make valid inferences. You should know that among all areas of mathematics, the one that is used the most widely in business and research centres is Statistics and especially since softwares greatly facilitates the calculations! This is why this chapter is one of the biggest in this book even if only the basic concepts are presented! Note also that Statistics have a very bad reputation at the university because the notations are often confusing and vary greatly from one teacher to another, from one book to another, from one practitioner to another. Strictly speaking, it should comply with the vocabulary and notation of the ISO 3534-1:2006 norm and unfortunately this chapter was written before the publication of this standard ... a certain period of adaptation will be necessary to obtain the full compliance. It is perhaps useless to precise that Statistic is widely used in engineering, theoretical physics, fundamental physics, econometrics, project management and in the industry of process, in the fields of life and non-life insurance, in the actuarial science or in the database analysis (with Microsoft Excel very often ... unfortunately ....) and the list goes on. We will also meet quite often the tools presented here in the chapters of Fluid Mechanics, Thermodynamics, Technical Management, Industrial Engineering and Economy (especially in the last two). The reader can then refer to them to have some concrete practical applications of the most important theoretical elements that will be seen here. Note also that in addition to a few simple examples on these pages, many other application examples are given on the exercise server of the companion website in the categories Probability and Statistics, Industrial Engineering, Econometrics and Management Techniques. Definition (#84): The main purpose of Statistics is to determine the characteristics of a given population from the study of a part of the population, called "sample" or "representative sam- ple". The determination of these characteristics should enable statistics to be a tool for the decision help! Remark The data processing concerns the "descriptive statistics". The interpretation of data from estimators is called "statistical inference" (or "inferential statistics"), and mass data anal- ysis "statistical frequency" as opposed to Bayesian inference (see section Probabilities). V 1 / When we observe an event taking into account some given factors, there can happen that a sec- ond observation takes place in conditions that seem identical. By repeating these steps several 4. Arithmetic EAME v3. 5-2013 times on different supposedly similar objects, we find that the observed results are statistically distributed around a mean value that is ultimately the most likely possible outcome. In prac- tice, however, we sometimes perform a single measurement and then the goal is determine the value of the error we make by adopting it as average measure. This determination requires knowledge of the type of statistical distribution we are dealing with and that is on what we will focus (among others) to study here (at least the basics!). However, there are several common methodological approaches whe we face the hazard (less common are not mentioned yet): 1. A first is to simply ignore the random elements, for the simple reason that we do not know how to integrate them. We then use the "scenarios method" also called "deterministic simulation". This is typically the tool used by financial managers and non-graduates managers with tools like Microsoft Excel (which includes a scenarios management tool) or MS Project (which includes a tool to manage the deterministic optimistic, pessimistic and expected scenarios). 2. A second possible approach, when we do not know how to associate probabilities to specific future random events, is game theory (see section Game and Decision Theory) where semi-empirical criteria of selection are used as the criterion of maximax, minimax, Laplace, Savage, etc. 3. Finally, when we can link probabilities to random events, whether these probabilities derived from calculations or measurements, whether they are based on experience from previous similar situations as the current situation, we can use descriptive and inferential statistics (contents of this chapter) to obtain usable and relevant information from this mass of acquired datas. 4. A last approach when we know the relative probabilities from intervening events in re- sponse to strategic choices is the use of decision theory (see section Game and Decision Theory). 314/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Remarks Rl. Without mathematical statistics, a calculation on datas (e.g. an average), is a "punctual indicator". This is mathematical statistics which gives it the status of estimator whose bias, uncertainty and other statistical characteristics are controlled. We generally seek to ensure that the estimator is unbiased, convergent and efficient (we will see during our study of estimators further what is exactly all that stuff). R2. When we communicate a statistic it should be an obligation to specify the confidence interval, the p- value and the size of the studied sample (absolute statistics) and its detailed characteristics and make available the sources and data protocol otherwise it has almost no scientific value (we will see all these concepts in detail further below). A common mistake is to communicate in relative values. For example, on a test group of 1,000 women, 5 women will die from breast cancer without screening check, with screening check 4 women will die. Some will say a little to quickly (typically physicians....) that screening checks saves 20% of women (relative value as one the of five could have been saved...). In fact this is wrong since the absolute benefit of screening is insignificant! R3. If you have a teacher or trainer who dares to teach you statistics and proba- bilities only with examples based on gambling (cards, dice, matches, coins, etc.) dispose or denounce him. Normally examples should be based on the industry, economy or R&D, i.e. in areas used in daily by businesses!). 7.1 Samples During the statistical analysis of sets of information, the way to select the sample is as important as how to analyze it. The sample must be representative of the population (we do not necessarily make reference to human populations!). For this, the random sampling is the best way to achieve it. Definitions (#85): Dl. The statistician always starts from the observation of a finite number of elements, which we name the "population". The observed elements, in quantity n, are all of the same nature, but this nature can be very different from one population to another. D2. We are in the presence of a "quantitative character" when each observed element is ex- plicitly subject to the same measure. To a given quantitative character, we associate a "quantitative variable" continuous or discrete, which summarizes all the possible val- ues that the measure can take (such information being represented by functions like the Gauss-Laplace distribution, the beta distribution, the Poisson distribution, etc.). Remark We will come back on the concept of "variable" and "distribution" a little further... info @ sciences. ch 315/5785 4. Arithmetic EAME v3. 5-2013 D3. We are in the presence of a "qualitative character" when each observed element is ex- plicitly subject to a single connection to a "modality" from a set of exclusive modalities (e.g.: man | woman) that permits to classify all studied elements in a given certain point of view (such information being represented by bar charts, sector charts, bubble charts, etc.). All modalities of a character can be established a priori before the survey (a list, a nomenclature, a code) or after the survey. A study population can be represented by a mixed character, or set of modalities such as gender, wage range, age, number of children, marital status for example for an individual. D4. A "random sample" is by default (without more precision) a sample in which all indi- viduals in a population have the same chance, or "equally likely probability" (and we emphasize that this probability must be equal), to end up in the sample. D5. In the opposite in a sample whose elements were not chosen randomly, then we talk about a "biased sample" (in the opposite case we talk about a "non-biased sample"). Remark A small representative sample is by far preferable to a large biased sample. But when the sample sizes are small, the hazard can a result worst than the biased one... V 7.2 Averages The concept of "average" or "central tendency" (financial analysts call it a "measure of loca- tion"...) is with the notion of "variable" at the basis of statistics. This notion seems very familiar to us and we talk a lot about it without asking too many ques- tions. But there are various qualifiers (we emphasize that this are only qualifiers!) to distinguish the way of the resolution of a problem of calculating the average. Thus, you must be very very careful about the calculations of averages because there is a ten- dency in business to rush and to systematically use the arithmetic mean without thinking, which can lead to serious errors ! A nice example (for an analogy) is that a considerable number of laws require only moderate levels of pollution per year, while for example, smoking one cigarette per day during 365 days does not have the same impact as smoking 365 cigarettes in one day dur- ing one year when both have the same average taken over a year ... This is a clear evidence of statistical incompetence of the legislature. Here is a small sample of common mistakes: • Consider that the arithmetic mean is the value that divides the population into two equal parts (although it is the median that does this). • Consider that the average of the ratios of the type goals/realisations is equal to the ratio of the average of the goals and of the average of the realisations (although it is not the same thing!). 316/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic • Consider that the average salaries of different subsidiaries, is equal to the global average (while this is true if and only if there is the same number of employees in each subsidiary of the company). • Consider that the average of the average of the rows in a table is always equal to the average of the columns of the same table (although this is true if and only if the cell contents are not empty). • Calculate the arithmetic average growth of the revenue in % (as the geometric mean must be used). • etc. We will see below different average with examples relative to arithmetic, to the enumeration, to physics, to econometrics, to geometry and sociology. The reader will find other practical examples by browsing the entire book. info @ sciences. ch 317/5785 4. Arithmetic EAME v3. 5-2013 Definitions (#86): As given Xi real numbers, then we have: Dl. The "arithmetic average" or "sample average" (the most commonly known) is defined as the quotient of the sum of n observed x t values by the total size n of the sample: 1 n Ha ^ x i (7.1) and is very often written x or fi and is for any discrete or continuous statistical distribution an unbiased estimator of the mean. The arithmetic average represents a statistical measure expressing the magnitude that would have each member of a set of measures if the total must be equal to the product of the arithmetic average by the number of members. If some values repeats more than once in the measurements, the arithmetic mean is then often formally noted as following: H a 1 n - V n l x l n T^i Xi 1=1 n n ^2 w i x i 2—1 (7.2) and is named "weighted average". Finally, we could indicate that under this approach, the actual weighted average will be named "mathematical mean" or just "mean" in the field of study of probabilities. We may as well use the frequencies of occurrence of the observed values named "classes frequencies": fi = ~ (7.3) n So that we get another equivalent definition named the "weighted average by the classes frequencies": 1 n n n Ha = -Y' riiXi = V — Xi = V fiXi (7.4) n r— ( n *=i i=i i=i Before continuing, it’s important to know that in the field of statistics it is useful and often necessary to combine measurements/data in class intervals of a given width (see examples below). We often have to make several tries to choose the intervals even if there are semi- empirical formulas for choosing the number of classes when we have n available values. One of these semi-empirical rules used by many practitioners is to retain the smallest integer k of classes such as: 2 k > n V Z (7.5) the width of the class interval is then obtained by dividing the range (difference between the maximum and minimum measured value) by k. By convention and rigorously... (so rarely respected in the notations), a class interval is closed on the left and open on the right (see section Functional Analysis): (7.6) 318/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic This empirical rule is called the "Sturges rule" and is based on the following reasoning: We assume that the values of the binomial coefficient C k gives the number of individuals in an ideal histogram (we let the reader check this simply with a spreadsheet software l ik e Microsoft Excel 11.8346 and the C0MBIN(k,i) function) of k intervals for the i-th interval. As as k becomes large the histogram looks more and more like a continuous curve called the "Normal curve" or "bell curve" as as we will see later. Therefore, based on the binomial theorem (see section Calculus), we have: n = E c i = E cy~ k t> k = E c i k r- k i k = J2c i k = (a + b) k = {i + i) fc = 2 k (7.7) *= o i=0 i=0 i = 0 Then, for each interval i the practitioner will traditionally take the average between the lower and upper limit for the calculation and multiply it by the corresponding class fre- quency fi. Therefore, the grouping of class frequencies implies that: (a) The weighted average by the frequencies differs from the arithmetic average. (b) As the approximation seen above it will be a worst indicator compare to the arith- metic average... (c) It is very sensitive to the choice of the number of classes, than very bad at this level. There are many other empirical rules for the discretization of random variables. For example, the software XLStat offers not less than 10 rules (constant amplitude, Fisher algorithm, A'-mcans, 20/80, etc.). Later, we will see two very important properties of the arithmetic average and of the mean that you will have to understand absolutely (the weighted average of deviations from the average and the average deviations from the average). Remark The "mode", noted Mod or simply M 0 , is defined as the value that appears most often in a set of data. In Microsoft Excel 11.8346, it is important to know that the MODE ( ) function returns the first value in the order of values having the largest number of occurrences therefore assuming a unimodal distribution. \ y D2. The "median" or "middle value", noted M e is the value that cut the population values into two equal parts. In the case of a continuous statistical distribution fix) of a random variable X, it is the value that represents the value that has 50% of cumulative probability to occur (we will see further in details the concept of statistical distribution): P{X < M e ) = P(X > M e ) = M e +00 / f{x) Ax = / f(x)dx = 0.5 M e (7.8) In the case of a series of ordered values xi, x 2 , ..., ...x n , the median is therefore by definition the value such that we have the same number of values that are greater than or equal to it than the number of values that are less or equal to it. info @ sciences. ch 319/5785 4. Arithmetic EAME v3. 5-2013 Remarks Rl. The median is mainly used for skewed distributions, because it represent them better than the arithmetic average. R2. The median is in practice often not a single value (at least in the case Tl Tl where n is even). Indeed, between the values corresponding to ranges — and — + 1 there is an infinite number of values to choose which cut the population in half. V J More rigorously: • If the number of terms is odd, i.e. of the form 2 n + 1, the median of the series is the term of order n + 1 (that the terms are all distinct or not!). • If the number of terms is even, i.e. of the form 2 n, the median of the series is half- sum (arithmetic average) of the values of the terms of rank n and n + 1 (that the terms are all distinct or not!). In any case, by this definition, it follows that there are at least 50% of the terms of the serie that are smaller than or equal to the median, and at least 50% of the terms of the serie that are greater than or equal to the median. For example, consider the table of wages below: Employee N° Wage Cumulated Employees % Cumulated Employees 1 1,200 1 6% 2 1,220 2 12% 3 1,250 3 18% 4 1,300 4 24% 5 1,350 5 29% 6 1,450 6 35% 7 1,450 7 41% 8 1,560 8 47% 9 1,600 9 53% 10 1,800 10 59% 11 1,900 11 65% 12 2,150 12 71% 13 2,310 13 76% 14 2,610 14 82% 15 3,000 15 88% 16 3,400 16 94% 17 4,800 17 100% Table 4.16 - Identification of the median There is in the table an odd number 2 n + 1 of values. So the median of the series is the term of rank n + 1. This is 1,600.— (result that give any spreadsheet software). The arithmetic average is in this case about 2, 020. — . In direct relation with the median it is important to define the following concept to under- stand the underlying mechanism: 320/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Definition (#87): Be given a statistical series xi,x 2 , ..., Xi , ...,x n , we name "dispersion of absolute differences" around x the number e'(x) defined by: e'(x) = E I Xi ~ x I (7.9) e'(x) is minimum for a value of x closest to a given value Xi in the sense of the absolute error value. The median is the value that achieves this minimum (extremum)! The idea will then be to study the variations of the function to find the position of this extremum. Indeed, we can write: Vx G [x r , x r+ i] , r G {1, 2, 3, ..., n — 1} e\x) = E Xi x 1=1 E i=r+ 1 Xi X (7.10) Then by definition of the x value: r n e'(x) = E I Xi - X I + E i = 1 i=r+l Xi X = [rx - (xi + x 2 + ... + x r ) - [(x r+ i + ... + x n - (n - r)x] = (2 r - n)x + (x r+ i + ... + x n ) - Lpi + x 2 + ... + x r ) (7.11) What allows us to skip the absolute values is simply the choice of the index r that is taken so that the serie of values in practice can always be split into two parts: all that is less than the element indexed by r and all that is superior to it (i.e.. the median by anticipation...). e'{x) is also a piecewise (discrete) affine function (similar to the equation of a line for fixed fixed values of r and n) where we see that by analogy the factor: 2 r — n (7.12) is the slope of the function and: (x r+ i + ... + X n ) - (xi + x 2 + ... + X r ) (7.13) the F-intercept (ordinates at the origin). n The function is decreasing (negative slope) until r is less than — and increasing when n r is greater than — (it passes trough an extremum!). Specifically, we distinguish two particularly cases of interest since n is an integer: • If n is even, we can say that n = 2 n' , then the slope can be written 2(r — n') and it is equal to zero if r — n and then, as the result is valid by construction only for Vx G [x r , x r+ i] then e'(x) is constant on [x n >, x n /+i] and we have an extremum necessarily in the middle of this range (arithmetic average of the two terms). • If n is odd, we can say that n = 2n' + 1 (we cut the series into two equal parts), then the slope can be written (2r — 2 n' — 1) and it is zero if r = n' + -, as the result is only valid for Vx G [x r ,x r+ i] then it is immediate that the middle value is the median x n /+i. info @ sciences. ch 321/5785 4. Arithmetic EAME v3. 5-2013 We find out the median in both cases. We will also see later how the median is defined for a continuous random variable (the underlying idea is the same). There is another practical case where the statistician has at its disposal only the values grouped in intervals of statistical classes. The procedure for determining the median is then different: When we have at our disposal only a values grouped in intervals of statistical classes, the abscissa of the point of the median is generally within a class. To then get a more accurate value of the median, we perform a linear interpolation. This is what we name the "linear interpolation method of the median". The median value can be read from the graph or calculated analytically. Indeed, consider the graph of the cumulative probability F(x) in class intervals as below where the bounds of the intervals were connected by straight lines: 0 2 4 8 Median Figure 4.67 - Graphical representation of the estimation of the median by linear interpolation The value of the median M e is obviously located at the crossroads between the cumulated probability of 50% (0.5) and the abscissa. Thus, by applying the basics of functional analysis, we have (just by observing that the slope in the interval containing the median is equal in the half-interval to the left and to right adjacent to the median): A x M e - 2 4-2 (7.14) Ay 0.5 -0.2 0.7 -0.2 What we frequently write: M e — a b — a (7.15) 0.5 - F{a) F(b ) - F{a) Thus the value of the median: (7.16) Consider the following table that we will see again much later in this chapter: 322/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Price of tickets Number of tickets Cumulated number of tickets Relative frequencies of tickets [0,50[ 668 668 0.068 [50,100[ 919 1,587 0.1587 [100, 1 50[ 1.498 3,085 0.3085 [150,200[ 1.915 5,000 0.5000 [200,250[ 1.915 6,915 0.6915 [250,300[ 1,498 8,413 0.8413 [300,350[ 919 9,332 0.9332 [350,400[ 440 9,772 0.9772 [400 and + 228 10,000 1 Table 4.17 - Identification of the median and the mode We see that the "median class" is in the range [150, 200] because the cumulative value of 0.5 is there (column at the right of the table) but the median has, using the previously established relation, the precise value of (it is trivial in the particular example of the table above, but we still do the calculation...): M e = 150 + (200 - 150) 0,5 ~ 0-3805 = 200 (7.17) v ' 0.5 — 0.3805 and of course we can do the same with any other percentile! We can also give a definition to determine the modal value if we are only in possession of the frequencies of class intervals. To see that we start with diagram below named "grouped distribution" in frequencies bar: Figure 4.68 - Graphical representation of the estimation of the modal value with classes intervals Using Thales relations (see section Euclidean Geometry), it comes immediately, noting M the modal value: M — Xi x i+ \ — M Af A 2 (7.18) info @ sciences. ch 323/5785 4. Arithmetic EAME v3. 5-2013 As in a proportion, we do not change the value of the ratio by adding the numerators and adding the denominators, we get: M - Xj _ x i+ i - M _ x i+ i - Xj Ai A2 Ai + A2 We then have: M = Xi + 1 ( x i+ i - Xi) (7.20) L\\ + A 2 With the previous example this gives then: M 15Q (1,915 - 1,498) (1, 915 - 1, 498) + (1, 915 - 1, 498) 150 + ^(250 - 150) = 150 + ^100 = 200 (250 - 150) (7.21) The question that then arises is to the appropriateness of the choice of the mean, mode or median in terms of communication ... (normally we communicate them all three in corporate reports!). A good example is that of the labor market where in general, while the average wage and the median wage are quite different, the institutions of state statistics calculate the median than many traditional media then explicitly equate to he concept of "arithmetic average" in their news... Remark To avoid getting an arithmetic average having little sense, we often calculate a "trimmed average", i.e. an arithmetic average calculated after removing outliers in the series (using Grubbs or Dixon Tests). v I ! J The "quantile" generalize the notion median by cutting the distribution in sets of equal parts (of the same cardinality we might say ...) or in other words in regular intervals. We define the "quartiles," the "decile" and "percentile" on the population, ordered in ascending order, that we divide by 4, 10 or 100 parts of the same size. So we talk about the 90th percentile to indicate the value separating the first 90% of the population and the 10% remaining. Note that in Microsoft Excel 11.8346 the functions QUARTILE( ), PERCENTILE ( ), MEDIAN ( ), PERCENTRANK( ) are available and it can be useful that we specify that there are several variants of calculating these percentiles that explains possible variation between the results of different spreadsheet softwares. This concept is very important in the context of confidence intervals that we will see much further in this section and very useful in the field of quality with the use of "box plots" (also named "Box & Whiskers plots") to compare ("discriminate" as experts say) quickly two populations of data or more and especially to eliminate outliers (taking as reference the median will just make more sense!). 324/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Figure 4.69 - Box & Whiskers Plot painfully made with Microsoft Excel 11.8346 Or more explicitly as it should be in any good statiscal software: Extreme Outlier Analysis of T rain Arrival Delay 3 rd Quartiie Light Outlier 1 st Quartiie Figure 4.70 - Box & Whiskers Plot ideal chart type Another very important mental representation of box plots is the following (it can get an idea of the asymmetry of the distribution as is able to do the R software): info @ sciences. ch 325/5785 4. Arithmetic EAME v3. 5-2013 Mode Qi M Qs Mode Mode Q\ M 03 Ql M Q3 Figure 4.71 - Graphical representation of the mode, median and 1st + 3rd quartile compared to a distribution The concepts of median, outliers and confidence intervals that have yet been proved and/or just defined are so significant that there exists international standards for their 326/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic proper use. First let us cite the norm ISO 16269-7:2001 Median - Estimation and confi- dence intervals and also the norm ISO 16269-4:2010 Detection and treatment of outliers. D3. By analogy with the median, we define the "medial" as the value (in ascending order of values) that shares the (cumulative) sum values into two equal masses (i.e. the total sum divided by two). In the case of our wages example, while the median gives the 50% of the salaries being below and above the medial gives how many employees share (and therefore the sharing wage) the first half and how many employees share the second half of the total of the wages costs. Employee N° Wage Cumulated Wages % Cumulated Wages 1 1,200 1,200 3.5% 2 1,220 2,420 7% 3 1,250 3,670 10.7% 4 1,300 4,970 14.5% 5 1,350 6,320 18.4% 6 1,450 7,770 22.6% 7 1,450 9,220 26.8% 8 1,560 10,780 31.4% 9 1,600 12,380 36.1% 10 1,800 14,180 41.3% 11 1,900 16,080 46.8% 12 2,150 18,230 53.1% 13 2,310 20,540 59.8% 14 2,610 23,140 67.4% 15 3,000 26,140 76.1% 16 3,400 29,540 86% 17 4,800 34,340 100% Table 4.18 - Identification of the mediale The sum of all wages is equal to 34, 340 and therefore the medial is 17, 170 then the medial is between the employees No. 11 and 12, while the median was 1, 600. We see then that the medial corresponds to 50% of the aggregate. This is a very useful indicator in Pareto or Lorenz analysis (see section Quantitative Management). D4. The "root mean square" sometimes denoted simply Q which comes from the general relation: (7.22) but where we take rri — 2. info @ sciences. ch 327/5785 4. Arithmetic EAME v3. 5-2013 ^Example: Consider a square of side a, and another square of side b. The average area of two squares equals one square of side: 2 a 2 + b 2 = 2 ^ ^ q (7.23) In Microsoft Excel 1 1.8346 you can combine the functions SUMSQ ( ) , COUNT ( ) and to quickly calculate the root mean square as following: = (SUMSQ (. . .) /COUNT (. . .)) ~ (1/COUNT ( . . .)) D5. The "harmonic mean" sometimes simply denoted H is defined by: E*E 2—1 flh (7.24) It is little known but is often the result of simple and relevant arguments (typically the equivalent resistance of an electrical circuit with several resistors in parallel). There is a HARMEAN ( ) function in Microsoft Excel 1 1.8346 to calculate it. ^Example: Consider a distance d travelled in one direction at the speed v\ and in the other direction (or not) at the speed v 2 . The arithmetic average speed will be obtained by dividing the total distance 2d by the time of the travel: v = 2d t (7.25) If we calculate the time it takes when travel d with a speed Vi that is simply the quotient: U = d (7.26) The total time is therefore: 2d d d t = — = t\ + f 2 = 1 v t’i v 2 (7.27) If the distance is not the same for the both velocities anyway each velocity remains the same this is why d disappears! In other words: We use the harmonic mean when are given to us ratios. 328/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic D6. The "geometric mean" sometimes simply denoted G is defined in the general case by: (7.28) This average is often forgotten by undergraduate employees but famous it is famous in the field of finance (see section Economy) this is also why there is an GEOMEAN ( ) function in Microsoft Excel 11.8346 to calculate it. A geometric mean is often used when comparing different items - finding a single "fig- ure of merit" for these items - when each item has multiple properties that have different numeric ranges. For example, the geometric mean can give a meaningful "average" to compare two companies which are each rated at 0 to 5 for their environmental sustain- ability, and are rated at 0 to 100 for their financial viability. If an arithmetic mean were used instead of a geometric mean, the financial viability is given more weight because its numeric range is larger so a small percentage change in the financial rating (e.g. go- ing from 80 to 90) makes a much larger difference in the arithmetic mean than a large percentage change in environmental sustainability (e.g. going from 2 to 5). The use of a geometric mean "normalizes" the ranges being averaged, so that no range dominates the weighting, and a given percentage change in any of the properties has the same effect on the geometric mean. So, a 20% change in environmental sustainability from 4 to 4.8 has the same effect on the geometric mean as a 20% change in financial viability from 60 to 72. Remark In 2010, the geometric mean was introduced to compute the Human Develop- ment Index by United Nations Development Programme. Poor performance in any dimension is directly reflected in the geometric mean. That is to say, a low achievement in one dimension is not anymore linearly compensated for by high achievement in another dimension. The geometric mean reduces the level of sub- stitutability between dimensions and at the same time ensures that a 1% decline in index of, say, life expectancy has the same impact on the HDI as a 1% decline in education or income index. Thus, as a basis for comparisons of achievements, this method is also more respectful of the intrinsic differences across the dimensions than a simple average. V I / Like for the number 0, it is impossible to calculate the geometric mean with negative numbers. However, there are several work-arounds for this problem, all of which require that the negative values be converted or transformed to a meaningful positive equivalent value. Most often this problem arises when it is desired to calculate the geometric mean of a percent change in a population or a financial return, which includes negative numbers. For example, to calculate the geometric mean of the values +12%, —8%, 0% and +2%, in- stead calculate the geometric mean of their decimal multiplier equivalents of 1.12, 0.92, 1 and 1.02, to compute a geometric mean of 1.0125. Subtracting 1 from this value gives the geometric mean of +1.25% as a net rate of growth (or in financial circles is named the "Compound Annual Growth Rate C.A.G.R."). info @ sciences. ch 329/5785 4. Arithmetic EAME v3. 5-2013 ^Example: Suppose that a bank offers an investment opportunity and plans for the first year an interest (this is absurd, but this is an example) with a rate (X — Y)% but for the second year with an interest rate (A" + Y)%. At the same time another bank provides a constant interest rate for two years: X%. We will say a little bit to fast that this is the same... In fact the two investments do not have the same profitability! In the first bank, a capital Co will give after the first year of interest: (X - Y)% ■ Co (7.29) and the second year: (X + Y)% [(X - Y)% • Co] In the other bank we will have after one year: X% • C 0 and after the second year: X%(X% ■ Co) (7.30) (7.31) (7.32) and so on... As you can probably see it the placement will not be identical if Y 0! X% is the not the arithmetic average of (X — Y)% and (X + Y)%. Now if we write: r, = (X + Y)% and r 2 = (X - Y)% (7.33) What is in reality the average value of the global interest rate r? After 2 years (for example), the capital is multiplied by r\ ■ r 2 . If an average exists it will be denoted by r and the capital will thus be multiplied by r 2 . Then we have the relation: r 2 = r\Ti -O' r — \fr\r-i (7.34) This is an example where we therefore see the geometric mean. Forgetting to use of the geometric mean a common mistake in companies where some employees calculate the arithmetic average rate of increase of a reference value. D7. The "moving average" of order n is defined as: MM n Xi + X 2 + ... + x n n (7.35) The moving average is used particularly in economics, where it represents a trend of a series of values, where the number of points is equal to the total number of points of the serie of values less the number that you specify for the period. 330/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic A moving average in finance is calculated from the average of a stock price over a given period: each point of a moving average of 100 sessions is the average of 100 last current values. This curve, displayed simultaneously with the evolution of the curve of the values, smooths the daily changes in the value and gives the possibility to better see the trends. The moving averages can be calculated for different time periods, which can generate short-term trends MMC (20 sessions according to the habits of the domain), medium (50-100 sessions) or long-term MML (over 100 sessions): Figure 4.72 - Graphical representation of a few moving averages The crosses of the moving averages with the price curve (cutted with a certain granularity) of the value generate purchase or sale (basic) signals depending on the case: • Buy signal: when the price curve crosses the MM up. • Sell signal: when the price curve crosses the MM down. In addition to the moving average, note that there are a lot of other artificial indicators often used in finance (the R software has a package dedicated only to such indicators). As for example the "upside/downside ratio". The idea is the following: If you have a financial product (see section Economy) whose current price is P c for which you have a goal of high gain corresponding to a high price, which we will denote by P h (high price) and conversely, the potential loss that you feel is at a price Pi (low price). P h ~ Pc Pc- Pi UD (7.36) info @ sciences. ch 331/5785 4. Arithmetic EAME v3. 5-2013 ^Examples: El. For example, a financial product of 10.— with a low price of 5.— and a high price of 15.— has a ratio of U D /,> = 1 and therefore an identical speculative factor to allow a gain or loss of 5. — . E2. A financial product of 10.— with a low price of 5.— and a high price of 20.— has a ratio of UD/. ; = 2 and therefore twice the speculative potential gain compared to the loss. Remark Some financial institutions recommend to refuse equation below 3. Investors also tend to reject too high equation that can be a sign of artificial inflation. \ / D8. The "weighted average" (the moving average and arithmetic average are just a special cases of the weighted average with vj { = 1) is defined by: (7.37) Is used for example in geometry to locate the centroid of a polygon, in physics to de- termine the center of gravity or in statistics to calculate the mean and other advanced regression techniques and in project management for estimating task durations forecast. In the general case the weights Wi represents the weighted influence or arbitrary/empirical influence of the element Xi relatively to the others one. D9. The "functional mean" or "integral average" is defined as: = b — a f(x)dx (7.38) where /if depends of a function / of a real integrated variable (see Differential and Inte- gral Calculus) on a range [a, b\. It is often used in signal theory (electronics, electrotech- nichs). 332/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 7.2.1 Laplace Smoothing To come back to our class frequencies seen above and before proceeding with the study of some mathematical properties of the averages... you should know that when we work with discrete probabilities distributions it happens very (very) often that we meet a typical problem whose source is the size of the population. Consider as an example the case where we have 12 documents and that we would like estimate the probability of occurrence of the word "Viagra". We have on a sample the following values: Document ID Word occurences 1 1 2 0 3 2 4 0 5 4 6 6 7 3 8 0 9 6 10 2 11 0 12 1 Table 4.19 - Class frequencies of the word Table that we can represent in another way: Word occurences Documents Probability 0 4 0.33 1 2 0.17 2 2 0.17 3 1 0.083 4 1 0.083 5 0 0 6 2 0.17 Table 4.20 - Respective frequencies classes of documents And here we have a common phenomenon. There is no record with 5 occurrences of the word of interest. The idea (very common in the field of Data Mining) is then to add artificially and empirically using a count using a technique called "Laplace smoothing" which involves adding k units at each occurrence. Therefore the table becomes: info @ sciences. ch 333/5785 4. Arithmetic EAME v3. 5-2013 Word occurences Documents Probability 0 5 0.26 1 3 0.16 2 3 0.16 3 2 0.11 4 2 0.11 5 1 0.05 6 3 0.16 Table 4.21 - Frequencies classes of documents with smoothing Obviously this type of technique is debatable and beyond the scientific framework ... We even hesitated to introduce this technique in the chapter of Numerical Methods (with the rest of all the empirical numerical techniques) rather than here... 7.2.2 Means and Averages properties Now we will see some relevant properties that connect some of these means and averages or are specific to a particular mean/average. The first properties are important so beware to understand them: PI. The calculation of the arithmetic, root mean square and harmonic average/mean can be generalized using the following expression: /' \l~2^ x n / , i i = 1 (7.39) where we see: (a) For m = 1, we get the arithmetic average (b) For m — 2, we get the root mean square (c) For m = —1 we get the harmonic mean P2. The arithmetic average has the property of linearity, that is to say (without proof because it is simple to check): Xx + /j, = Xx + [i (7.40) This is the statistical version of the property of the mean in the field of probabilities that we will see further. P3. The weighted sum of the deviations from the arithmetic average is zero. 334/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Proof 4.28.2. First, by definition, we know that: r i r n = '^ j n i and ji — — njXj (7.41) i= 1 n t= 1 then we have: r r r r / 1 r \ y r r > ]ni(xi»fi)=y ] riiXi—fi y ] rti =Y. n iXi- n=y ]mXi-y ]n,iXi=0 2—1 2—1 2—1 2=1 \ n i = i ) 2=1 2=1 y (7.42) Thus, this tool can not be used as a measure of dispersion! By extension, the arithmetic average of the weighted deviations from the average is also equal to zero: J2 n i(xi- /i) 1=1 = 0 (7.43) n □ Q.E.D. This result is quite important because it will further be useful for a better understanding of the concept of standard deviation and variance. P4. Now we would like to prove that: 11 h < fig < Ha < fig (7.44) Remark The comparisons between the above means/averages and the median or the weighted or moving averages does not make sense this is why we won’t compare them. V / Proof 4.28.3. First, we consider two nonzero real numbers x\ and x 2 as x 2 > X\ > 0 and then we write: (a) The arithmetic average: (b) The geometric mean: Xi + x 2 f^a X l-lg = \fXlX 2 (c) The harmonic mean: 11 X 2 X\ 1 — - — | — 1 _ Xi X 2 _ X±X 2 X 2 Xi _ + X 2 ^ t _ 2XlX 2 Hh 2 2 2x\X 2 X\ + x 2 (7.45) (7.46) (7.47) info @ sciences. ch 335/5785 4. Arithmetic EAME v3. 5-2013 (d) The root mean square: rp 2 I 2 _ AU ~~ o We will start to prove that y g > /j, h by contradiction by putting y g — y h < 0: l^g H'h \J X\X 2 2xyx 2 _ ^x l x 2 x l + y/xix 2 x 2 - 2x x x 2 x± + x 2 Xi + x 2 sfxyx~ 2 x i + y/x \x 2 x 2 - 2 x±x 2 < 0 \Jx\x 2 x\ + yjx \x 2 x 2 < 2 xia :2 yjx \X 2 y/x rx 2 < 0 + < 2 X 2 Xi X\ + ,/- < 2 x 2 V X 1 By convenience we will now put: y = and we know that y > 1. We therefore have: — b \ — — - + y x 2 V Xi y and remember we search if it is possible that: We can now easily check this statement from the following equivalences: + \ — - + y — x 2 V Xi y y 2 + 1 y (7.48) (7.49) (7.50) (7.51) (7.52) < 2 y 2 + 1 — 2y < 0 (y - l) 2 < 0 (7.53) There is also a contradiction, and this validates our initial hypothesis: Ah? ~ > 0 <^> n 9 > fj, h (7.54) Let see if fi g > fi a . Under the hypothesis x 2 > x\ > 0. We search now to prove that: Xl + ^2 > \JX\X 2 (7.55) Now we have the following equivalences: X\ ~\~ X 2 / \ O 9 9 / \9 > ^X\X 2 \X\ + X 2 ) > ^X\X 2 4=^ X\ + x 2 — 2X\X 2 > 0 [Xi — x 2 ) > 0 (7.56) and the last expression is obviously correct because the square of a (real) number is always positive which verifies our initial hypothesis: AU /T; ^ AU > jdg (7.57) 336/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic We will prove now that ji q > mu a by contradiction by putting fi q — ji a < 0: l^q l~^a Ixj + xl Xi+X 2 2 2 %1 + xl Xi + x 2 2 2 xj + xl ^ Xi + x 2 < 0 2 „2 1 '2 (7.58) x{ + X2 x{ + 2x1X2 + X2 ^ 2 < 4 2 2 X? + 2 XiX 2 + Xn =► A + xj < ^ x\ — 2 xix 2 + x 2 < 0 => (xi - x 2 ) 2 < 0 But the square of a (real) number is always positive which verifies our initial hypothesis: Hq /l a > 0 <=> Hq > fl a (7.59) We then have: (7.60) / ^ h l-lg l^a ^ l^q □ Q.E.D. Once these inequalities proved, we can then move on to a figure that we attribute to Archimedes to place three of these averages. The interest of this example is to show that there are some remarkable relations between statistics and geometry (coincidence??). Figure 4.73 - Starting point for the geometric representation for the various averages info @ sciences. ch 337/5785 4. Arithmetic EAME v3. 5-2013 We will first write a = AB , b = BC and O is the midpoint of AC. Thus, the circle is drawn with center O and radius O A. D is the intersection of the perpendicular segment AC through B and of the circle Q (we can choose the intersection we want). H is itself the orthogonal projection of B on OD. Archimedes says that O A is the arithmetic average of a and b and that BD is the geometric mean of a and b, and DH is the harmonic mean of a and b. We then prove that (could be trivial): r AC a + b Therefore O A is the arithmetic average fi a of a and b. We have in the right-angled triangle ADB: AD 2 = DB 2 + BA 2 (7.61) (7.62) Then we have in the right-angled triangle BDC: DC 2 = BC 2 + L)B 2 (7.63) We then add these two relations, and we get: 2 DB 2 + DA 2 + BC 2 = AD 2 + DC 2 (7.64) We know that D is on a circle of diameter AC, so ADC is rectangle on D. Therefore: AD 2 + DC 2 = AC 2 (7.65) And then we replace BA and BC by a and b: 2DB 2 + a 2 + b 2 = AC 2 = (a + b ) 2 2DB 2 = 2 ab & (7.66) So finally: DB = Vab (7.67) And therefore, DB is the geometric mean fi g of a and b. We have now prove that DH is the harmonic mean of a and b. We have in a first time using the orthogonal projection as study in the section of Vector Calculus: DdoDi = \\DO\\ \\DB\\ cos (a) = DO ■ TJH = °^~ L)H (7.68) Then we also have (also orthogonal projection) U6o D$ = ( \\DO\\ cos (a)) \\DB\\ = DB ■ TAB = TAB 2 (7.69) Therefore we have: DB a + b 2 DH and since DB = Vab, we have then: DH 2 ab a + b DH is therefore the harmonic mean of a and b. Archimedes was not wrong! (7.70) (7.71) 338/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 7.3 Type of variables In talking about variables quantitative or qualitative variables, sometimes you hear variables being described as categorical (or sometimes nominal), or ordinal, or interval. Below we will define these terms and explain why they are important. Definitions (#88): Dl. The "discrete variables" (by counting) that belongs to Z: Are analyzed with statistical laws based on a countable definition domain always strictly positive (the Poisson or Hy- pergeoemetric distribution are such typical case in the industry). Are almost always rep- resented graphically by histograms. D2. The "continuous variables" (by measure) that belong to M: Are analyzed with statisti- cal laws based on an uncountable domain of definition strictly positive or may take any positive or negative value (typically the Normal distribution in the industry). Are almost always represented graphically by histograms with class intervals. D3. The "attribute variables" (by classification): They are not digital data (only when they are coded with digits!) but qualitative data type Yes, No, Passed, Failed, On time, Late, red, green blue, black, etc. The binary data type attribute follow a Bernoulli while higher order qualitative variables have no average or standard deviation (effectively... try to calculate the mean and standard deviation between the qualitative variables Red, Green and Pink...). In attribute variable we mainly distinct two subtypes of variables: (a) A "categorical variable" (sometimes called a nominal variable) is one that has two or more categories, but there is no intrinsic ordering to the categories. For example, gender is a categorical variable having two categories (male and female) and there is no intrinsic ordering to the categories. Hair color is also a categorical variable having a number of categories (blonde, brown, brunette, red, etc.) and again, there is no agreed way to order these from highest to lowest. A purely categorical variable is one that simply allows you to assign categories but you cannot clearly order the variables. If the variable has a clear ordering, then that variable would be an ordinal variable, as described below. (b) An "ordinal variable" is similar to a categorical variable. The difference between the two is that there is a clear ordering of the variables. For example, suppose you have a variable, economic status, with three categories (low, medium and high). In addition to being able to classify people into these three categories, you can order the categories as low, medium and high. Now consider a variable like educational experience (with values such as elementary school graduate, high school graduate, some college and college graduate). These also can be ordered as elementary school, high school, some college, and college graduate. Understanding the different types of data is an important discipline for the engineers because it has important implications for the type of analysis tools and techniques that will be used. info @ sciences. ch 339/5785 4. Arithmetic EAME v3. 5-2013 A common question regarding the collection of data is what is the amount that should be col- lected. In fact it depends on the desired level of accuracy. We will see much further in this section (with proof!) how to mathematically determine the amount of data to collect. Now that we are relatively familiar with the concept of average (mean), we can discuss on more formal calculations and that will make sense. 7.3.1 Discrete Variables and Moments Consider X is an independent variable (an individual of a sample, whose property is inde- pendent of other individuals) that can take discrete random values (realizations of the vector (X l , X 2 , . . . , X n ) ) with respective probabilities (pi , p 2 , . . . , p n ) where, by the axioms of probabil- ities (see section Probabilities): fte [0,1] Ep* = 1 ( 7 - 72 ) Definitions (#89): D1 . Let X be a numeric (quantitative) random variable (r.v.). It will be fully described in prac- tice most of time by the value of the probability (for discrete variables) for a realization of this variable or by the cumulative probability (for discrete AND continuous variables) to be typically less than or equal X for all realizations x. This cumulative (cumulative) is denoted by: F(x) = P(x < X)i\/x G R (7.73) with: 1 > P(X) >0 and 1 > F(X) > 0 (7.74) where F(x) is named the "repartition function" of the random variable X. It is the theo- retical proportion of the population whose value is less than or equal to x. It follows for example: P(X > x) = 1 — F(x) <=> P(X < x) + P(X > x) = 1 (7.75) More generally, for any two numbers a and b with a < b, we have: P(a < x < b) = F(b) — F(a ) (7.76) D2. The "empirical repartition function" is naturally defined by (we have indicated the differ- ent notations that you can found in the literature): r yX) / j i-xi<x / j bxi< x n i= 1 n i= 1 n (7.77) associated with the sample of independent and identically distributed variables which as we know is named a "random vector" denoted by (x 1 , x 2 , ..., x n ). 340/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic It is simply the cumulative frequencies of appearance normalized to unity below a certain fixed value (approach that the majority of human beings are naturally using when seeking the repartition function). So if we take again the example of wages already used above, then we have for example for x fixed to 1,800: Ordered Wages Xi < x Frequencies l Xi < x 1,200 1 1,220 1 1,250 1 1,300 1 1,350 1 1,450 1 1,450 1 1,560 1 1,600 1 1,800 1 1,900 0 2,150 0 2,310 0 2,600 0 3,000 0 3,400 0 4,800 0 Table 4.22 - Example of the empirical repartition function And then: 1 17 1 F 17 (x < 1, 800) = — £ l Xi < x = — 10 ^ 59% (7.78) 1 ' i=i 1 ' The repartition function is clearly a monotonically increasing function (or more precisely "non-decreasing") whose values range from 0 to 1. 7.3. 1.1 Mean and Deviation of Discrete Random Variables Definition (#90): We define the "expectation" or "mean", also named "moment of order 1", of the random variable X by the relation (with various notations): fix = E(X) = Epf) = J2Pi X i (7.79) also sometimes named "parts rule". In other words, we know that for every event in the sample space is associated with a probability that we also associate with a value (given by the random variable). The question then is know info @ sciences. ch 341/5785 4. Arithmetic EAME v3. 5-2013 what value we can get at long term? The expected value (the mean...) is then the weighted average, by the probability, of all values of the events of sample space. If the probability is given by a discrete distribution function f(xi) (see the definitions of distri- bution functions later below in the text) of the random variable, we then have: Pi = f(xi) => Px = E(X) = J2 x if(xi) (7.80) Remark Rl. The mean p x can also be written just simply // if there are no possible confusion on the random variable. R2. If we consider each realization of the random variables (x\, x 2 , ..., x n ) as the components of a vector x and each associated probability (or ponderation) (pi, p 2 , ..., p n ) as the components of a vector p we can write the mean in a technical way using the scalar product (Vector Calculus) often written: n pox = (p,x) = J2PiXi = EpQ (7.81) i = 1 V / Here are the most important mathematical properties of the mean for any random variable (whatever the distribution law!) and that we will often use throughout this section (and many other involving statistics): PI. Multiplication by a constant (homogeneous): E(aX) = J2aXiP(X = Xj) = a'^XjP{X = X{) = aE(aX) (7.82) P2. Sum of two random variables (independant or not!): E(X + y) = E [(®i + Vi)P ((* = Xi) n (Y = Vi))} i,j = E = X.) n (Y = ft))] + E [yX({x = X.) n (V- = ft))] hJ = E Xi Y J P{(X=x i )n(Y = y l )) + E» 5)pp = x,)n(l' = # )) = Tx,P <X = x,) D\J(Y = y.) + E vX (Y = ft) n UW = x i) = E X t P(X = x.) + E VjP(Y = ft) = E(X) + E(F) Where we used in the 4th line, the property view in the section of Probabilities: p f U = E p ( A >) Vie N / ieN (7.83) (7.84) 342/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic We deduce that for n random variables X t following any probability distribution: E(X0 = EETO (7.85) i P3. Then mean of a constant a is equal to the constant itself: E(q) = £ api = a ^2 pi = a ■ 1 = a (7.86) i i P4. Mean of a product of two random variables: E(X • Y) = £ x iVi P{X = Xi , Y = Vj ) (7.87) And if the two random variables are independent, then the probability is equal to the joint probability (see section Probabilities). Therefore we have: E(x • Y) = £ x iVj p ( x = x i , Y = yj) = £ x iy 3 P( x i)P(yj) = J2 x i x iP( x i)yiP(yi ) i,j i,j i,j = Y, x iP( x i)Y,yi p (yi) = E (^) E (^) i,j i,j (7.88) So the mean of the product of independent random variables is always equal to the product of their means. We will assume as obvious that these four properties extend to the continuous case! Definition (#91): After having translated the trend by the mean it is interesting to have and indicator that reflects the dispersion or "standard deviation" around the mean by a value named "variance of X" or "second-order centered moment" or "mean square error (MSE), written V (A") or o\ (read "sigma square") and given in its discrete form by: 4 = V(X) = MSE = E [(X - px) 2 ] = J2( x i - yx ) 2 /4) = £4 - yx) 2 Pi (7.89) i i The variance is however not directly comparable to the mean because of the fact that the units of the variance are the square of the unit of the random variable, which follows directly from its definition. To have an indicator of dispersion that can be compared to the parameters of central tendency (mean, median and ... mode), it then suffices to take the square root of the variance. For convenience, we define the "standard deviation" of X by: ^ = a(X) = yV(Xj (7.90) Remark Rl. The standard deviation a x of the random variable X can be written simple a if there is no possible confusion. R2. The standard deviation and variance are, in the literature, often named "dis- persion parameters" as opposed to the mean, mode and median that are named "positional parameters". V W info @ sciences. ch 343/5785 4. Arithmetic EAME v3. 5-2013 Definition (#92): The ratio (expressed in %): (7.91) is often used in business to compare the mean and the standard deviation and is named the "co- efficient of variation C.V." because it has no units (which is it’s main advantage!) and because many industrial statistical methods consider that a good C.V should ideally be just about a few % only. More generally for any statistics estimator 6 (sum, average, median, etc.) we can build a coeffi- cient of variation such that: CV = (7.92) e Thus, in practice we consider that: Coefficient of variation Quality 20% Poor 10% Acceptable 5% Controlled 2.5% Excellent 1.25% World Class 0.0625% Rarely achieved Table 4.23 - Qualitative judgments of C.Vs commonly accepted Why do we find a square (respectively a square root) in the definition of the variance? The intuitive reason is simple (the rigorous much less ...). Remember, that we have shown above that the sum of the deviations from the actual weighted average is always zero: r Y / n l (x i - f i) = 0 (7.93) i= 1 If we assimilate the size of each sample by the probability by normalizing the sample size with respect to n, we come upon a relation that is the same as the variance with the difference that the term in brackets is not squared. And then we immediately see the problem... the dispersion measure is always zero, hence the need to bring this to the square. We could, however, imagine to use the absolute value of deviations from the mean, but for a number of reasons that we will see later during our study of estimators, the choice of squaring is quite natural. Note, however, the common use in the industry of two common other indicators of dispersion: 1 . "The mean absolute deviation" (mean of the absolute values of deviations from the mean): n H K - ^ I AVEDEV = — (7.94) n 344/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Which is a very elementary indicator used when we do not want to make statistical in- ference on a series of measures. This deviation can be easily calculated in the English version of Microsoft Excel 1 1.8346 using the AVEDEV ( ) function. 2. The "median absolute deviation" denoted MAD (median of absolute values of deviations from the median): MAD = M e ( \X - M e (X) |) (7.95) which is considered as a more robust measure of dispersion than those given by the mean absolute deviation or the standard deviation (unfortunately this indicator is not natively integrated in spreadsheets softwares). ^Example: Consider the following measure of a random variable X: (1,1, 2, 2, 4, 6, 9) (7.96) and where the median value is given as we know by: M e (X) = M e ( 1,1, 2, 2, 4, 6, 9) = 2 (7.97) The absolute deviations from the median are then: | X — M e (x) |= (1,1, 0,0, 2, 4, 7) (7.98) Placed in ascending order, we then have: (0,0, 1,1, 2, 4, 7) (7.99) where we easily identify the absolute deviation from the median, which is: MAD = M e ( 0, 0, 1, 1, 2, 4, 7) = 1 (7.100) In the case where we have at disposition a series of measures, we can estimate the experimental value of the mean (expectation) and of the variance with the following estimators (it is simply the of average and standard deviation of a sample when the events are equally likely) with the specific notation: n i n fi = j2 x i and - A) 2 i=l n i = 1 Proof 4.28.4. First for the mean: n n i i n /i = Epf) = J2Pi X i = ~ Xi = ~J2 Xi = P i=l n i n 1=1 i = 1 (7.101) (7.102) info @ sciences. ch 345/5785 4. Arithmetic EAME v3. 5-2013 And for the variance: a 2 = E((A' - E(A)) 2 ) = E((A - fi) 2 ) = , - a) 2 2= 1 = i £(.*,- A ) 2 =<J 2 i= 1 i=i (7.103) □ Q.E.D. Theorem 4.29. Let us prove now a very nice little property as the arithmetic average is an optimum for the sum of squared errors. Proof 4.29.1. - a) 2 = Y x i -2 aJ2 'Xi + na i= 1 2=1 2=1 And if we search for a as the derivative of the above expression is equal to zero: (7.104) — ( Y x 2 l -2aY Xi + na 2 ) = 0 d « \h h ) then a is an optimum. We have therefore: d dct Y, — 2a Yj x i + nQf2 ) = — 2 Y^ x i + 2na = 0 \i = 1 2=1 / 2=1 or after rearrangement and an elementary simplification we get: (7.105) (7.106) a Xi n i = 1 (7.107) □ Q.E.D. It is effectively the arithmetic average! Now to see if it is an maximum extrema or minimum extrema we just need calculate the second derivative (see section Differential and Integral Cal- culus) and see if it gives a positive constant (i.e. the first derivative increases when a increase). Therefore we immediately see that it is effectively a minimum extrema! ! ! Remark The term of the sum that we see in the expression of the variance (standard deviation) is named the "sum of squared deviations from the mean" or "sum of squared errors from the mean". We also name it the "total sum of squares", or "total variation", or "sum of square errors" in the context of the study of the ANOVA (see the further below) V i Before that we continue, let us recall the concept of geometric mean seen above (widely used for returns in finance or growth analyzes in % of sales): l l g (7.108) 346/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic It’s fine but employees in financial departments also need to calculate the standard deviation of this average. The idea is then to take the logarithm to reduce it to a simple arithmetic mean (it is still obviously an estimator!): ln(/i 3 ) = In 1 n E ln 4) i = 1 (7.109) Therefore, since taking the logarithm of the values we have the arithmetic mean of the log values, then the logarithm of the geometric standard deviation (with physicist reasoning like...) will be: In (&g) = J - E ( ln 04 - ln (/Ay)) = \/-E ln ( — ) (7.110) V n i=! y n i= i V/W Then we just take the exponential of the standard deviation of the logarithms of the values to have the "geometric standard deviation": n (J a = e E ln i= 1 Xi l^g (7.111) The variance can also be written in the very important way named the "Huygens relation" or "Konig-Huygens theorem" or "Steiner translation theorem" that we will reuse several times thereafter. Let’s see what it is: V(X) = E[(X - fix) 2 } = E<E “ = J2 ( X 1 ~ + fix) f(xi ) i i = E x 1 f( x i) - 2 ^x E x if( x i) + ^E /( x *) = E x lf( x i) - 2 /4 + A (7.H2) i i i i = E 4/fe) - tA = E(X 2 ) - & = E(X 2 ) - E(Xf Let us now do a relatively small hook to a common scenario generator of errors in business when several statistical series are handled (very common case in the industry as well as in insurance or finance) ! Consider two data series on the same character: • (xi, rii), (x 2 , n 2 ), ..., (x p , rip) sample of total size n, arithmetic average x, standard devi- ation a x . • (j/i, mi), (y 2 , m 2 ), ..., (y p , m q ) sample of total size m, arithmetic average y, standard de- viation a y. We then have: p g E x i + E Vi 2=1 2=1 n + m 1 p n— E x i + n g 2=1 nx + my n + m n + m (7.113) info @ sciences. ch 347/5785 4. Arithmetic EAME v3. 5-2013 So the average of the averages is not equal to the overall average (first common mistake in business) except if the two data series have the same sample size (n = m )\ ! ! Let have a look at the standard deviation always with the same situation. First remember that we have: a l = J2P*( X i ~ X ) 2 = ~J2 n i(Xi ~X) 2 = ^ n ■ 1 i—*i= 1 n i i = 1 X n /( x i - x) 2=1 9 2—1 1 9 — X ! - y)‘ (7.114) 2=1 rn 2=1 E<=i To continue, recall that we have previously proved the Huygens theorem and therefore: 1 /c V(Z) = a, = E(Z) - E(Z) 2 = k iZ 2 - z 2 i=i (7.115) Therefore we have: k X n i x i + X m <y 2 i - 2 f nx + my \ 2 n + m ’ 2 \ / nx + my n + m VS [ i = i n + m n - X ^iX 2 + m— X wiiJ/i n 2=1 m 2=1 nx + mt/ \ 2 n + rn \ n + m ' 1 p 1 -L X — x 9 _9 -L n — X — nx 2 + m — X — my 2 ™?=i 771 r=i nx + my 2 / nr + my \ n + m + n + m ,<,-7 \ 2 n + m 7 (7.116) n - X + m (— X "fi// 2 - y 2 n i= 1 m i=l n + m + nx + my 2 /nr + my \ n + m n + m ' na 2 + mcx 2 nr + my 2 / nx + my \ 2 n + m n + m V n + m 7 na 2 + m<7 2 (nr 2 + my 2 )(n + m) — (nr + my 2 ) n + m + (n + m) 2 na 2 + mcr 2 (n 2 x 2 + mny 2 + mnx 2 + m 2 y 2 ) — (n 2 x 2 + ‘Inxmy + m 2 y 2 ) + n + m And we continue on the next page...: (n + m) 2 348/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 2 na 2 + ma 2 mny 2 + mnx 2 — 2 nxmy = n + m + ncx 2 + mcr 2 n + m + nm (n + m) 2 (£ + ^) 2 (n + m) 2 ncx 2 + mcr 2 y 2 + x 2 - 2 xy + nm- n + m {n + m ) 2 (7.117) So we see that the overall standard deviation is not equal to the sum of the deviations (second common mistake in business) unless the sample sizes and arithmetic averages are the same in both series (that is to say n = m and x = y ) ! ! ! Consider now X being a random variable of mean /j (constant and determined value) and vari- ance a 2 (constant and determined value), we define the "reduced centered variable" by the relation: Y X- y a (7.118) Theorem 4.30. We prove in a very simple way by using the property of linearity of the mean and property of scalar multiplication of the variance that: E(Y) = 0, V (Y) = 1 (7.119) Proof 4.30.1. For the proof we just use the definitions of the expected mean and variance (using Huygens theorem for this latter). So let us begin with the mean: E(Y) = E ( = -E (. X -») = -(E(X) - E(/i)) = -(/* - p) = 0 = fiy (7.120) V a ) a a a And now with the variance using the Huygens theorem: V(F) = E(Y 2 ) - E(Y) 2 = E(Y 2 ) - /j 2 , = E(Y 2 ) - 0 2 = E(Y 2 ) = E ((^T 1 )") = E (X 2 - 2 A> + A 2 )) = li (E(A' 2 ) - 2/tE(X) + „ 2 ) = ( E (X 2 > ~ 2 W‘ + M 2 ) = 4 <E(A' 2 ) - m 2 ) = ((V(A') + E( A) 2 ) - M 2 ) = li (V(A) + f - M 2 ) = 4v(X) = ^ = 1 (7.121) □ Q.E.D. Thus, any statistical distribution defined by a mean and standard deviation can be transformed into another distribution often easier to analyze statistical. Therefore making this transforma- tion, we obtain a random variable for which the parameters of the distribution low are now useless to know. When we do that with other laws, and in the general case, when we speak of "pivotal variables". Here are some important mathematical properties of the variance: PI . Multiplication by a constant: V(aX) = f(xi)(axi - ay) 2 = a 2 f( x i)( x i ~ vY = « 2y (7.122) info @ sciences. ch 349/5785 4. Arithmetic EAME v3. 5-2013 P2. Sum of two random variables: V (X + Y) = E (( x * + Vi) - (vx + / i >')) 2 f( x i, Vi) * 3 = EE (fa* - Vx) + ( Vi - Vy)) 2 /fai, Vi) i 3 = EE(( x i — ^x) 2 + 2(Xj — /ix) fai — AT ) + fai — /A") 2 ) / fai> Vi) * j = £}(xi- /Ar) 2 /fai, Vi) + 2Y^{xi - Hx){yi ~ PY)f{x%, Vi) i,j M + T,(yi- HY) 2 f( x i,yt) = V(X) + 2^1 fa; - /Ac) fa* _ /w)/fai,S/i) + V(Y) i,5 = V(X) + 2E [(X - /i X )(F - /r)] + W{Y) := V(X) + 2cov(X, Y) + V(X) (7.123) Where we meet for the first time the concept of "covariance" denoted by cov(). P4. Product of two random variables (using the Huyghens theorem): V(X ■ Y) = E ((AT) 2 ) - E(AT) 2 = E(X 2 T 2 ) - E (XT) 2 (7.124) And if the two random variables are independent, we get: V(X ■ Y) = E(X 2 )E(T 2 ) -E(X) 2 E(T) 2 (7.125) What we can rewrite using once again the Huygens theorem: V(X • Y) = (V(X) - E(X) 2 )(V(T) - E(T) 2 ) - E (X) 2 E(T) 2 (7.126) We will assume as obvious that these four properties extend to the continuous case! 350/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 7.3. 1.2 Discrete Covariance We have seen in on of the last equations the concept of "covariance" for which we will determine a more convenient expression later: cov(X, Y) = c x , y = 2E [pQ - ,i x ){Y - /jP] (7.127) We introduce now a more general and very important expression of the covariance in many application fields: Vpf + Y + Z) = XXX(0u + %' + z k) - (/Ur + /W + Vz)) 2 f(xi,yj,z k ) i j k = X X XX _ flx) + (yj - Hy) + (z k - Vj, z k ) i j k = E£E (A + B + c) 2 f( Xi , Vl ,^) i j k = X X X(^ 2 + B 2 + C 2 + 2 AB + 2 BC + 2 AC)f(x h y 3 , z k ) i j k = V(X) + V(y) + V(Z) + 2 X X + bc + A C)f(xi, Vj , Zk) i j k = V(X) + V(y) + V(Z) + 2cov(X, Y) + 2cov(y, Z) + 2cov(X, Z) (7.128) Now we change the notation to simplify even more: V(Xi + X 2 + X 3 ) = V(Xp + V(X 2 ) + V(X 3 ) + 2cov(X 1} X 2 ) + 2cov(X 2 , X 3 ) + 2cov(X 1 , X 3 ) 3 3 = X V (^*) 2 X cov(Xj, Xj) i = 1 i<j (7.129) Therefore in the general case: Or using standard deviation: (7.130) +2j2cov(X i ,X j ) i<j i= 1 (7.131) Using the properties of the mean (especially E(X) = c te and E(c te ) = c te ) we can write the covariance in a much simpler way for calculation purposes: covpf, y) = e [(x - E(x))(y - E(y))] = e [xy - E(x)y - xE(y) + E(x)E(y)] = E(xy) - E(E(x)y) - E(XE(y)) + e(e(x)e(x)) = E(xy) - E(x)E(y) - E(x)E(y) + e(x)e(f) = E(xy) - E(x)E(y) (7.132) info @ sciences. ch 351/5785 4. Arithmetic EAME v3. 5-2013 and we obtain the relation widely used in statistics and finance in the practice called the "co- variance formula"...: c- v ,y = cov(X, Y) = E (XY) - E(X)E(F) (7.133) which is however best known when written as: Cx,y = cov(X, Y ) = - V x t y t - xy n (7.134) If X = Y (equivalent to a univariate covariance) we fall back again on the Huyghens theorem: c AVY = cov(X, X) = E(XX) - E(X)E(X) = E(X 1 2 ) - E(X) 2 (7.135) Remark Statistics can be partitioned according to the number of random variables we study. Thus, when a single random variable is studied, we speak of "univariate statistics", for two random variables of "bivariate statistics" and in general, of "multivariate statistics". V / If and only if the variables are equally likely, we find the covariance in the literature in the following form, sometimes named "Pearson covariance", which derives from calculations that we have done previously with the mean: 1 n c x,Y = -5^(Vi - yv)(xi - Hx) (7.136) n U Covariance is a measure of the simultaneous variation of X and Y. Indeed, if X and Y generally grow simultaneously, the products (y { — /xy)(.x,; — nx) will be positive (positively correlated), whereas if Y decreases as X increases, these same products will be negative (negative correla- tion). Note that if we distribute the terms of the last equation, we have: 1 n C x,Y = cov(X, Y) = y'fy,; - Hv){xi - y x ) n fYi 1 n -y)( x i-x) n i = 1 1 n - Y, x i(vi ~y)~ x (vi - y) n 5 2 x i(vi-y ) ~ x Y^(yi \i = 1 i=l (7.137) and we have already shown that the sum of the deviations from the mean is zero. Hence we get another common way to write the covariance: 1 n C x,Y = cov(X, Y) = - Y^Xi(yi - y) (7.138) n Xt and by symmetry: 1 n C x,Y = cov(X, Y) = - yi( x i - x ) (7.139) n i = i 352/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic So in the end, in the equiprobable case, we finally have the equivalent three important relations used in various sections of this book: (7.140) In the section Theoretical Computing for the study of linear regression and factor analysis we will need the explicit expression of the bilinearity property of the variance. To see what it is exactly, consider three random variables X , Y and Z and two constants a and b. Then using the third relation given above, we have: cov (Y, aX + bZ) — — YiVi ~ y){axi + bz t ) = ~Y [(Vi ~ y) ax ^ + (Vi ~ y) hz i\ n 1 n n ~ y) aXi + H(vi ~ y) hz (7.141) = a ~ Y{yi - y) x i + b ~ 1 Z(vi ~ y)*i n i n i = acov(A", Y) + 6cov(Y, Z) The last relation is also important and will be used in several sections of this book (Economy, Numerical Methods). It also allows us to directly obtain the covariance for the sums of various random variables. ^Example: If A", Y, Z, T are four random variables defined on the same population, we want to compute the following covariance: cov(3A + 5Y,4Z-2T) (7.142) We will develop that in two phases (this is also why we call that "bilinearity property"). First with respect to the second argument (random choice!): cov (3 A + 5Y, 4Z - 2 T) = 4cov(3A + 5 Y, Z) - 2cov(3A + 5Y, T) (7.143) And then with respect to the first: cov(3A" + 5Y, 4 Z — 2 T) = 4 [3cov(A", Z) + 5cov(Y, Z)\ — 2 [3cov(A, T) + 5cov(F, T)] (7.144) So in the end: cov(3A + 5Y, 4 Z — 2 T) = 12cov(A", Z) + 20cov(Y, Z) — 6cov(A", T) — 10cov(Y, T) (7.145) info @ sciences. ch 353/5785 4. Arithmetic EAME v3. 5-2013 Now, consider a set of random vectors X t := X ?: of components (aq, x 2 , x n )i. The calcula- tion of the covariance of the components by pairs gives what is called the "covariance matrix" (a tool widely used in finance, management and statistical numerical methods!). Indeed, we define the component (m, n ) of the covariance matrix by: CO\ Xm,X n E [(X m fJ, Xm )(X n ■ Cm,n (7.146) We can therefore write a symmetric matrix (usually in practice it must be a square matrix...) in the form: Cll C12 C 21 C 2 2 -Cnl Cn2 ' ' ' C nn _ where £ is the usual tradition letter to denote the covariance matrix. By symmetry and because n(n + 1) it is a square n by n matrix only the number of components is useful for us to deter- mine the whole matrix (trivial but important information for when we will study the structural equation modeling in the Numerical Methods section). Cl n C 2 n (7.147) This matrix has the remarkable property that if we take the set of all random vectors and we calculate the covariance matrix, then the diagonal will give us obviously the variances of each pair of vectors (see examples in the chapters Economics, Numerical Methods or Industrial En- gineering) because we have for recall: cov Xm ,x m = E [{X m - iix m ){X n - fi Xm )\ = E [(xj - Hx m ) 2 ] = V(X m ) = a Xm (7.148) This is why this matrix is often named "variance-covariance matrices" and finds itself sometimes also written as follows: £ = And this is a little bit abusively sometimes written as: <712 0's n £ = 'V u Cl2 ' n X'u Cl2 • ^1 n C 2 1 V 22 ■ 02 n = C 2 1 ^22 ' &2n _ Cnl Cn2 . V v nnj _^nl ^n2 2 G nn- (7.149) <*22 P'nl &n2 Gin &2n aL (7.150) This matrix has the advantage of quickly showing what pairs of random variables have a nega- tive covariance and there... for which random variable the variance of the sum is smaller than the sum of the variances ! Remark As we already mention it, this matrix is very important and we will often see it again in the section Economy during our study of modem portfolio theory and also for data mining techniques in the section of Theoretical Computing (principal compoments analysis for example but not only!) and also in Industrial Engineering during our study of bivariate control charts. V J 354/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Recall now that we have an axiom in probability (see section Probabilities) which stated that two events A and B are independent if and only if: P(A flS) = P(A)P(B) (7.151) Similarly, by extension, we define the independence of discrete random variables. Definition (#93): Let X, Y be two discrete random variables. We say that X, Y are independent if and only if: Vx, 1/68 P(X = x , 1 " = y) = P{X = x)P(Y = y ) (7.152) More generally, the discrete variables X 1: X 2 , ..., X n are independent (in block) if: n Vxi,...,x n eR P(x 1 = Xi,x 2 = x 2 , ...,X 3 = x 3 ) = Y[P(Xi = Xi) (7.153) 2=1 Theorem 4.31. The independence of two random variables implies that their covariance is zero (the opposite is false!). Proof 4.31.1. We will prove this in the case where the random variables take only a finite number of values {x, } r and { ijj } ; , respectively, with I, J finite sets. For the proof let us recall that: E (XY) = P(X = x h Y = yj )x iyj = Y, P{* = Xi)P{Y = yj )x iyj = E P(X = x,)x, E P(Y = yJVj = E (X)E(y) * 3 and therefore: Cx,y = E(XY) - E(X)E(y) = EpQE(y) - E(X)E(1") = 0 Remark So small is the covariance (near to zero), more the series are independent. Conversely, the greater the covariance (in absolute value) higher the series are dependant. V / (7.154) (7.155) Given that: V(X + Y) = V(X) + V ( y ) + 2cx,y and the fact that if X and Y are independent we have cx,y = 0. Then: v(x + y) = v(x) + v(y) (7.156) (7.157) More generally if X 1} .... X n are independent (in block) then for any discrete or continuous statistical distribution law (!) we have using the two most common notations: V Ex, =E V (-V) \i = 1 2=1 Or using the standard deviation: = WE a Xi (7.158) (7.159) 2=1 □ Q.E.D. info @ sciences. ch 355/5785 4. Arithmetic EAME v3. 5-2013 7.3.1.2.1 Anscombe’s famous quartet Anscombe’s quartet comprises four datasets that have nearly identical elementary statis- tical properties, yet appear very different when graphed or analyzed with undergraduate statistics rather than high-school one. Each dataset consists of eleven (x, y) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analyzing it and the effect of outliers on statistical properties. This quartet is also used to test if an analytical tool can be accepted a "statistics-compliant" (as the six corresponding used statistics should be the minimum provided by any high-school level analytical tool!). The datasets are as follows. The x values are the same for the first three datasets: I II III IV X y X y X y X y 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89 Table 4.24 - Anscombe’s quartet The quartet is still often used to illustrate the importance of looking at a set of data graphically before starting to analyze according to a particular type of relation, and the inadequacy of basic statistic properties for describing realistic datasets. With Microsoft Excel 14.0.7166 we get: 356/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic I II in rv x y * y x y x y 10.00 8.04 10.00 9.14 10.00 7.46 8.00 6.58 8.00 6.95 8.00 8.14 8.00 6.77 8.00 5.76 13.00 7.58 13.00 8.74 13.00 12.74 8.00 7.71 9.00 8.81 9.00 8.77 9.00 7.11 8.00 8.84 11.00 8.33 11.00 9.26 11.00 7.81 8.00 8.47 14.00 9.96 14.00 8.10 14.00 8.84 8.00 7.04 6.00 7.24 6.00 6.13 6.00 6.08 8.00 5.25 4.00 4.26 4.00 3.10 4.00 5.39 19.00 12.50 12.00 10.84 12.00 9.13 12.00 8.15 8.00 5.56 7.00 4.82 7.00 7.26 7.00 6.42 8.00 7.91 5.00 5.68 5.00 4.74 5.00 5.73 8.00 6.89 Mean 9.0000 7.5009 9.0000 7.5009 9.0000 7.5000 9.0000 7.5009 Median 9.0000 7.5800 9.0000 8.1400 9.0000 7.1100 8.0000 7.0400 Variance 10.0000 3.7521 10.0000 3.7524 10.0000 3.7478 10.0000 3.7484 Skewness 0.0000 - 0.0650 - 0.0000 - 1.3158 - 0.0000 [ 1.8555 3.3166 1.5068 Kurtosis 1.2000 - 0.5349 - 1.2000 0.8461 - 1.2000 4.3841 11.0000 3.1513 Correlation 0.8164 0.8162 0.8163 0.8165 Figure 4.74 - Anseombe’s quartet Statistics Summary As we can see with elementary statistical indicators it is almost impossible to guess a difference between the four data sets. But if we use the skewness or the kurtosis this change everything! Looking to the corresponding charts we get the same conclusion: Figure 4.75 - Anscombe’s quartet Graphs Summary info @ sciences. ch 357/5785 4. Arithmetic EAME v3. 5-2013 7.3.1.3 Mean and Variance of the Average Often in statistics, it is (verrrrry!) useful to determine the standard deviation of the sample mean and to work with it to get important analytical results in management and manufacturing. Let’s see what it is! Given the average of a series of terms, each determined by the measurement of several values (it is in fact its estimator in a particular case as we will see later): x = - (X 1 + X 2 + ... + X n ) (7.160) n then using the properties of the mean: E (X) = i (E(X x ) + E(X 2 ) + ... + E(X n )) (7.161) and if all the random variables are independent and identically distributed then we have: E (A) = — (fjj + fi + ... + fi) — — nfi = /I (7.162) v ' n n Remark We will prove much further below that if all the random variables are independent and identically distributed with finite variance, then the mean follows asymptotically what we name a "Normal distribution". For the variance, the same reasoning applies: V(X) = 4 = ^ (V(X0 + V(X 2 ) + ... + V(X n )) = ^ (ai + (j\ + ... + cr 2 n ) (7.163) And if the random variables are independent and identically distributed (we will study further the very important case current in practice where the last condition is not satisfied): v® = A 4 (7.164) Then we get the "standard deviation of the mean" also named "standard error" or "non- systematic variation": and this is strictly the standard deviation of the estimator of mean! (7.165) The more intuitive form to express the Standard Error in terms of percent for non- analytical workers, managers and chief executives is named "Relative Standard Error (RSE)" which is the expression of the Standard Error as percent, that is: RSE = ^ X (7.166) 358/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic The latter is quite useful when we have to deal with many variables with different units!! The value of ax is available in many softwares including Microsoft Excel charts (but there is no built-in function in Microsoft Excel) and is written with the standard deviation (as above) or with the notation of the variance (then we only have to take the square root...). Note that the last relation can be used even if the average of n random variables is not the same! The main condition is just that the standard deviations are all equal and this is the case in the industry (production). We then have: E (S n ) = nfi E (M n ) = fj, 2 v (Sn) = a 2 Sn = no 1 V(M„) = a 2 Mn = ^ (7.167) where S n is the sum of n independent identically distributed random variables and M n their estimated average. The reduced centered variable that we introduced earlier: Y = 7^^ (7.168) a can then be written in several very useful ways: S n - n[i M n - n “ <75 “ <Wn/H (7 ' 169) V / Furthermore, assuming that the reader already knows what is a Normal distribution a), we will show later in detail because it is extremely important (!) that the probability of the random variable X, average of n identically distributed and linearly independent random variables, has for law (obviously): a r a Vn (7.170) info @ sciences. ch 359/5785 4. Arithmetic EAME v3. 5-2013 7.3.1.4 Coefficient of Correlation Now consider X and Y two random variables having for covariance: cx,y = E [(26 — jdx)(Y — Hy)\ (7.171) Theorem 4.32. We have: {c xy f < V(X)V(Y) (7.172) We will prove this relation immediately because the use of the covariance alone for data analysis is not always great because it is not strictly limited and easy to use (at interpretation). We will construct an indicator easier to use in business. Proof 4.32.1. We choose any constant a and we calculate the variance of: aX + Y (7.173) We can then immediately write using the properties of the variance and the of the mean: W(aX + Y) = a 2 V(X) + V(F) + 2 ac x , Y (7.174) The right quantity is positive or null for any a by construction of the variance (left). So the discriminant of the expression, seen as a polynomial in a is of the type: P(x) = ax 2 + bx + c = a X+ 2~a b 2 — 4ac 4a 2 P(a) = Va 2 + 2 c x , Y a + V(Y') = V(.A) a + 2cx,y V (2cx,y) 2 - 4V(X)V(Y) 2V(X) 4v(xy (7. 75) Because P(a ) is positive for any a we have as only possibility that: (2 c A ',y) 2 - 4V(X)V(F) < 0 (7.176) Therefore after simplification: (cx,y) 2 < V(X)V(y) (7.177) □ Q.E.D. This gives us also: ( c x , y )‘‘ < , ^ l c A-,y| , v(A')v(y) - yv(X)v(y) “ Finally we get some a statistical inequality named "Cauchy-Schwarz inequality": -i< , Cx ’ y <i \/V( A')V(F) (7.178) (7.179) 360/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic If the variances of X and Y are non-zero, the correlation between X and Y is defined by the "linear correlation coefficient" (it is a standardized covariance so that its amplitude does not depend on the chosen unit measure) and written: cx,y y/V(X)V(Y) (7.180) Which can also be written in an expanded form (using Huyghens theorem): E (XY) - E(X)E(F) _ E (XY) - E(X)E(F) Vv(x)v(y) " V(E(A^ 2 ) - E(X)2) (E(y 2 ) - E(y)2) (7.181) or more condensed: Rx.y = cx,y &X&Y (7.182) Remark Note that normally, the letter R is reserved to say that this is an estimator of the correlation coefficient but the definition above is not an estimator (the variances doesn’t have the small hat...) and that, strictly speaking, we should then write pxy according to the traditions of use. \ / Whatever the units and the orders of magnitude, the correlation coefficient is a number between — 1 and 1 without units (so its value does not depend on the unit of measure, which is by far not the case for all statistical indicators!). It reflects more or less the linear dependence of X and y or geometrically more or less the flattness magnitude. We can therefore say that a coefficient of correlation of zero or close to 0 correlation means that there is no linear relation between the characters. But it does not involve any notion of more general independence. When the correlation coefficient is near 1 or —1, the characters are said to be strongly correlated. We must be careful with the frequent confusion between correlation and causality. Thus, two phenomena that are correlated does not imply in any way that one is the cause of the other. Indeed, for any two correlated events, A and B , the different possible relationships include: • A causes B (direct causation); • B causes A (reverse causation); • A and B are consequences of a common cause, but do not cause each other; • A causes B and B causes A (bidirectional or cyclic causation); • A causes C which causes B (indirect causation); • There is no connection between A and £>; • The correlation is a coincidence. info @ sciences. ch 361/5785 4. Arithmetic EAME v3. 5-2013 Coming back to the mathematical aspect of the correlation: • If Rx.y = — 1 we are dealing with a "pure negative correlation" (in the case of a linear relation all measurement points are located on a straight line with a negative slope). • If — 1 < Rx,y < 1 we are dealing with a negative or positive correlation named "imper- fect correlation" (in the case of a linear relation all measurement points are located on a straight positive or negative slope respectively). • If R.x.y = 0 the correlation is zero... (in the case of a linear relation all the measurement points are located on a straight line of slope zero). • If Rx,y = 1 we are dealing with a "pure positive correlation" (in the case of a linear relation all measurement points are located on a straight positive slope). The analysis of the correlation coefficient has the objective of determining the degree of as- sociation between variables: it is often expressed as the coefficient of determination, which is the square of the correlation coefficient. The coefficient of determination thus measures the contribution of a variable to the explanation of the second. Using the expressions of mean and standard deviation of equiprobable variables as demonstrated above (thus the idea of computing the correlation of two random variables is a good idea if they are jointly gaussian), we start: Rx.y = E(AT) - E(X)E(Y) ^(E(X 2 ) - E(X) 2 ) (E(Y'2) e(F)2) To obtain the estimator of the coefficient of correlation (7.183) 1 n X M Rx,y = n i — 1 l " l " N l^Vi n i=l hit-xAt Xi i= 1 in 1 / n \ Ki= 1 n U i r \i = 1 (7.184) where we see that the covariance becomes the average of the products minus the product of averages. Thus after simplification we get a famous expression: (7.185) The correlation coefficient can be calculated in the English version of Microsoft Excel 1 1.8346 and others with the integrated CORREL ( ) function. 362/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic We will see in the section Theoretical Computing a more general expression of the correlation coefficient. Remarks Rl. In the literature, the experimental correlation coefficient is often named "sampling Pearson coefficient" (in the equiprobable case) and when we carry it to the square, then we name it the "coefficient of determination". R2. Often the square of the coefficient is somewhat improperly interpreted as the % of variation explained in the response variable Y by the explanatory variable X. v I ! I J Finally, note that we have the following relation which is used a lot in practice (see the section Economics for famous detailed examples!): V(X + Y ) = V(X) + V(y) + 2c X) y = V(X) + V(Y) + 2 R x>Yy /V(X)V(Y) (7.186) or the version with the standard deviation even more famous: &X+Y ~ V' <Jx + a Y + 2 Rx,y&x&y (7.187) It is a relation that we can often see in finance in the calculation of the VaR (Value at Risk) according to RiskMetrics methodology proposed by JP Morgan (see section Economy). Let us see a small application example of the correlation but that has nothing to do with VaR (at least for the moment...). info @ sciences. ch 363/5785 4. Arithmetic EAME v3. 5-2013 ^Example: An airline company has 120 seats available that she reserves for connecting pas- sengers from two flights arrived earlier in the journey and that have to go to Frankfurt. The first flight arrived from Manila and the number of passengers on board follows a Normal distribution with mean 50 and variance 169. The second flight arrives in Taipei and the number of passengers on board follows a Normal distribution with mean 45 and variance 196. The linear correlation coefficient between the number of passengers of both flights was measured as: R x ,y = 0.5 (7.188) The law that follows the number of passengers for Frankfurt if we assume that the law of the couple also follows a Normal distribution (according to statement!) is: X + Y = J\f (hy + fJ>x, &x+y) (7.189) with: Hy + Hx = 50 + 45 = 95 !— - (7.190) (JX+Y = V a X + a Y + 2 Rx,Y a X&Y Rx,Y = 0.5 (7.191) The law that follows the number of passengers for Frankfurt if we assume that the law of the couple also follows a Normal distribution (according to statement!) is: X + Y = J\f (jj.y + H: v, ctx+y ) (7.192) with: Hy + l^x = 50 + 45 = 95 a x+Y = \fo\ + + ‘IRx.yOxOy = \/l69 + 196 + 2 • 0.5-/L69 • 196 ^ 23.38 (7.193) This is a bad start for customer satisfaction in the long term... 364/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 7.3.2 Continuous Variables and Moments Definitions (#94): 1. We say that X is a continuous variable if its "cumulative distribution function C.D.F." is continuous (already defined above). The distribution function of X is defined by for x € M or a truncated subset of M: F(x) = P{X < x ) (7.194) that is the cumulative probability that the random variable X is smaller than or equal to the set value x. We also have of course: 0 < P(X) < 1 (7.195) 2. We denote by: G(x) = 1 - F(x) = P{X > x ) the "survival function" or "tail function". (7.196) 3. If furthermore the distribution function F of X is continuously differentiable of derivative / (or sometimes denoted by p) named "density function" or "mass function" or just simply "distribution function" then we say that X is absolutely continuous and in this case we have: P(x i < X < x 2 ) = j f(x)dx = F(x 2 ) — F(x 1 ) (7.197) Xl with the normalization condition: P(X < - 00 ) +OO J f(x) dx d F(x) = 1 — OO — OO (7.198) Any probability distribution function must satisfy the integral of normalization in its do- main of definition! Remark It is interesting to note that the definition implies that the probability that a completely continuous random variable takes a given value tends to zero! So it is not because an event has almost a zero probability that it can not happen! ! ! V / The average being defined by a sum weighted by probabilities for a discrete variable, it becomes an integral for a continuous variable: (7.199) info @ sciences. ch 365/5785 4. Arithmetic EAME v3. 5-2013 and therefore the variance is written as: +oo V(X) = j [x -E(X)] 2 f(x)dx — OO (7.200) Then we have also the median that is logically redefined in the case of a continuous random variable by: and it rarely coincides with the average! And the modal value is given by the value of x where: d f{x) = Q dx (7.201) (7.202) Statisticians often use the following notations for the expected mean of a continuous variable: E {X),M{X),ti X ,p (7.203) and for the variance: V(X),S(X),a 2 x ,a 2 (7.204) That is the same as for the moment of discrete variable. Thereafter, we will calculate these different moments indicators with detailed proofs only for the most used cases. 7.4 Fundamental postulate of statistics One of the ultimate goals of statistics is, starting from a sample, to find the analytical distribution function that gave birth to the sample. This goal will be presented on this web site as a postulate (although this assumption is very difficult to apply in practice). Postulate: For any empirical distribution function F n (x) of the n-th measurement of the x random variable we can associate a theoretical distribution function F(x) to which it converges when the sample size is large enough if: X n = sup | F n (x) - F(x ) | (7.205) is the random variable defined as the largest difference (in absolute value) between F n (x) and F(x) (observed for all values of x for a given sample), then X n converges to 0 almost surely. 366/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Remark Mathematicians of Statistics prove this postulate rigorously as a theorem named the "fun- damental theorem of statistics" or the "Glivenko-Cantelli theorem" regarding continuous functions. Personally, even if we offends the experts, we think that this proof is not one because because it is very far away from the practical reality (yes this is our physicist side that emerges...) and this theoretical result leads many practitioners do their utmost (excluding data, transformations and other abominations) to find a known distribution law that they can adjust to their measured data. V / 7.5 Diversity Index It happens in the field of biology or business that you it is asked to a statistician or analyst to measure the diversity of a number of predefined elements. For example, imagine a multinational with a range of well-defined products and some of the stores (customers) in the world can choose a subset of this range for their business sales. The request is then to make a ranking of stores that sell the widest range of branded products and that by taking also into account the quantity. For example, we have a list of a total 4 products in our catalog. By hazard, three of our cus- tomers sell our 4 products but we would like to know which customers sells the greatest diversity and this by taking into account the quantities. We have the following sales data by product for the customer 1: Customer 1 Product 1 5 Product 2 5 Product 3 5 Product 4 5 For the customer 2: Customer 1 Product 1 1 Product 2 1 Product 3 1 Product 4 17 and for the customer 3: info @ sciences. ch 367/5785 4. Arithmetic EAME v3. 5-2013 Customer 1 Product 1 2 Product 2 2 Product 3 2 Product 4 34 A measure of information (diversity of states) that is well suited to this purpose is the Shannon formula introduced in the section of Statistical Mechanics whose mean: n S(x) = E (h(x)) = -A 5> t \og( Pi ) (7.206) 2=1 Arbitrarily, we will take and the logarithm in base 10 (so, if we have 10 equiprobable variables, entropy is unitary for example...). Therefore we have: n s ( x ) = -J2pi lo &w(Pi) (7.207) 2=1 We will rewrite this more adequately for the application in business. Thus, if n is the number of products and p, the proportion (or "relative frequency") of sales of product i from all sales N then: § (7.208) Then we have: nr r 1 72 S(x) = - E lo Sio(^) = [ lo §i0 (ft) - login (N)} n n n Y, fi [ 1 °Slo( iV ) - logio(/i)] 1 °glo( iV ) E/i-E fi lo gl0 (/«) _ i^i (7.209) N N n N\og m (N) - E/* lo §io (/0 2=1 TV This gives for the customer 1 (we stay in base 10 for the logarithm): N log(iV) - E fi log (fi) 2=1 20 log(20) - (5 log(5) + 5 log(5) + 5 log(5) + 5 log(5)) 20 20 log(20) — 20 log(5) 20 log(20) - log(5) = log(4) = 0.602 (7.210) 368/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic which is the maximum possible value (each state is equally likely). And for customer 2 we have: N log (N) ~Y.fi log(/i) i=l N 20 log(20) - (1 log(l) + 1 log(l) + 1 log(l) + 17 log(17)) 20 (7.211) And finally for customer 3: N log(iV) ~Yfi lo M) i=l N 40 log(40) - (2 log(2) + 2 log(2) + 2 log(2) + 34 log(34)) (7.212) Thus, the customer that has the greatest diversity is the first one. We also see an interesting property of the Shannon formula with customer 2 and 3 and this is that the quantity does not affect diversity (since the only difference between the two customers is that the quantity is multiplied by a factor of 2 and not diversity) ! 7.6 Distribution Functions (probabilities laws) When we observe probabilistic phenomena, and we take note of the values taken by them and that we report them graphically, we can observe that the individual measurements follow a typical characteristic which is sometimes adjustable theoretically with a good level of quality. In the field of probabilities and statistics, we call these characteristics "distribution functions" because they indicate the frequency with which the random variable appears for given values. C Remark We sometimes simply use the term "function" or "law" to describe these characteristics. ] These functions are in practice bounded by what we name the "range of the distribution" which is the difference between the maximum value (on the right) and the minimum value (on the left) of the observed values: R = max — min (7.213) In theory they are not necessarily bounded and then we talk (see section Functional Analysis) about a "domain of definition" or more simply about the "support" of the function. If the observed values are distributed in a certain way then there is a probability (or "cumulative probability" in the case of continuous distribution functions) to have a certain value of the distribution function. info @ sciences. ch 369/5785 4. Arithmetic EAME v3. 5-2013 In industrial practice (see section Industrial Engineering), the range of statistical values is im- portant (as well as the standard deviation) because it gives an indication of the variation of a process (variability). If L denote any possible univariate distribution function the range of the function is simply denoted by L if its domain of definition is M otherwise if it is bounded you will typically see something like L] 0) &]. Definitions (#95): Dl. The mathematical relation that gives the probability of a given value of the distribution function a random variable is named the "density function" (or "probability density func- tion"), "mass function" or "marginal function". D2. The mathematical relation that gives the cumulative probability that a random variable to be lower than or equal to a certain value of the distribution function is referred to as the "repartition function" or "cumulative function" or "cumulative distribution function". D3. Random variables are "independent and identically distributed (i.i.d.)" if they all follow the same distribution function, with the same parameters values and that they are inde- pendent. Such functions are very numerous, we offer then here to the reader a detailed study of the most known only. Before going any further it could be useful to know that if X is a continuous or discrete random variable, then are several tradition of notation in the literature to indicate that it follows a given probability distribution L. Here are the most common: X~L X = L X ^ L (7.214) X = L In this section and throughout the book in general, we will use the last notation! Here is the list of the distribution functions that we will see here as well as distribution functions commonly used in the industry and located in other chapters/section and those whose proof has yet still to be written: • Discrete Uniform Distribution U (a, b ) (see below) • Bernoulli Distribution B(l,p) (see below) • Geometric Distribution Q(N ) (see below) • Binomial Distribution B(N, k ) (see below) • Binomial Negative Distribution NB(N, k, p) (see below) • Hypergeometric Distribution H(n, p, m, k ) (see below) • Multinomial Distribution (see below) • Poisson Distribution Viji, k) (see below) • Gauss-Laplace/Normal Distribution A f(p, a) (see below) 370/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic • Log-Normal Distribution £A/”(/i, cr) (see below) • Continuous Uniform Distribution (see below) • Triangular Distribution (see below) • Pareto Distribution (see below) • Exponential Distribution (see below) • Weibull Distribution (see section Industrial Engineering) • Generalized Exponential Distribution (see section Theoretical Computing) • Erlang/Erlang-B/Erlang-C Distributions (see section Quantitative Management) • Cauchy Distribution (see below) • Beta Distribution (below and section Quantitative Management) • Gamma Distribution (see below) • Chi-2 Distribution (see below) • Student Distribution (see below) • Fisher-Snedecor Distribution (see below) • Benford Distribution (see below) • Logistic Distribution (see section Theoretical Computing) • Square Gauss distribution (still must be written) • Extreme value distribution (still must be written) Remark The reader will find the mathematical developments of the Weibull distribution function in the section on Industrial Engineering (Engineering chapter), and the logistic distribu- tion function in the section of Theoretical Computing. 7.6.1 Discrete Uniform Distribution If we accept that it is possible to associate a probability to an event, we can conceive of situations where we can assume a priori that all elementary events are equally likely (that is to say, they have the same probability to occur). We then use the ratio between the number of favorable cases and the number of possible cases to calculate the probability of all events in the Universe of events U. More generally, if U is a finite set of equally likely events and A is part of U , then we have using set theory notation (see section Set Theory): Card(A) #A 1 J Card (U) #U (7.215) More commonly, if e is an event that may have N equally likely possible outcomes. Then the probability of observing the outcome of this given event follows a "discrete uniform function" (or "discrete uniform law") given by the relation: Whose mean (or average) is given by: E(X) = Y Pi x i = Y p e x i = Pe Y x i = X * (7.216) (7.217) info @ sciences. ch 371/5785 4. Arithmetic EAME v3. 5-2013 If we put ourselves in the particular case where X{ — 1 with i Sequences And Series): E(A') 1 . . 1 1 N{N+ 1) Nr-! N 2 2 2=1 1...N. We then have (see N + 1 2 (7.218) If the random variable e take all values between [a, b] (another special case) such the distribution will be now denoted by U(a, b) then it should be obvious that we have for the expected mean: b b -ib -i / b a— 1 N e(a-) = y.px = p.J2i = (E'-E* i=a i=a u u ' 1 i=a U u ^ 1 \i= 1 i = 1 / 1 b(b + 1) (a — l)((a — 1) + 1) b — a + 1 V 2 ' 2 1 (6 — a + 1)(6 + a) a + b b — a + 1 2 2 1 6(6 + 1) — a(a — 1) 6 — o -|- 1 (7.219) For the variance we have (always using the results of the section on Sequences and Series): TV TV TV i V(A) = y>(i - a ) 2 = £ft(i - A ) 2 = £ T;(i 2=1 2=1 2=1 iV TV IV N \ 1 / N iV 2=1 TV -| / IV iV IV \ 1 'E* 2 - 2 ^E* + E^ 2 ) = ^ (E^-^E^^V AT 1 iv \ 2=1 AT 2=1 AT 2=1 k2=l 2=1 1A. 2 2 " . 2 1 iV(iV + i)(2iV + i) 2AT + lJ^. AT f-( ^ iV q n 2 E- 2=1 2=1 AT 2=1 (AT + 1)(2AT+1) AT + 1^. + (AT+1) S 6 AT 2=1 (A/ - + 1)(2AT + 1) N + 1 N(N + 1) (N + l) 2 6 N N + 1 (N + 1)(2N + 1) (N + l) 2 (N + l) 2 (N + 1)(2N + 1) (N + 1) 5 6 2 4 6 4 2(N + 1)(2N + 1) - 3(N + l) 2 2(2N 2 + N + 2N + 1) - 3N 2 - 6N - 3 12 AN 2 + 6N + 2- 3N 2 -6N -3 N 2 — 1 12 12 12 (7.220) If the random variable e take all values between [a, b] (another special case) such the distribution will be now denoted by U(a, b) then it should be obvious that we have for the variance: vpO N 2 - 1 12 {b - a + l) 2 + 1 12 (7.221) By symmetry of the distribution if all values of the domain of definition [a, b] are taken by the random variable we have for the median: M e = E(X) a + b 2 (7.222) 372/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Here is an plot example of the mass distribution function and cumulative distribution function respectively for discrete uniform law of parameters {1, 5, 8, 11, 12} (we see that each value is equally likely): Figure 4.76 - Uniform law U (density and cumulative distribution function) As we can see in the above diagram the cumulative distribution function can be written: #(/ :xi< x) F{x) = P(X <x) = N (7.223) [ Remark For sure the discrete uniform distribution has no specific modal value M 0 ! J info @ sciences. ch 373/5785 4. Arithmetic EAME v3. 5-2013 7.6.2 Bernoulli Distribution If we are dealing with a binary observation then the probability of an event is constant from one observation to the other if there is no memory effect (in other words: a sum of Bernoulli variables, two by two independent). We name this kind of observations where the randoms variables takes the values 0 (false) or 1 (true), with probability q = 1 — p respectively p, "Bernoulli trials" with "contrary events with contrary probabilities". Thus, a random variable X follows a "Bernoulli function" B(l,p) (or "Bernoulli law") if it can take only the values 0 or 1 , associated with probabilities p and q and so that q + p = 1 and: P(X = 0 ) = q P(X = 1) = p = 1 -q (7.224) The classic example of such a process is the game of piece face or sampling with replacement or be considered as such (this last case is very important in industrial practice). There certainly is no need for the reader to formally verify that the cumulative probability is unitary... Remark The introduction above is perhaps not relevant for business, but we will see in the section of Quantitative Techniques that the Bernoulli function naturally appears at the beginning of our study of queuing theory. Note that, by extension, if we consider N events where we get in a particular order k times one possible outcomes (success) and the other N — k (fail) times, then the probability of such a series (k successes and N — k failures ordered in any particular way) is given by : P(N, k ) = p k {l - p) N ~ k = p k q N ~ k (7.225) with N e N* according to what we got during the study of combinatorics in the section of Probabilities ! Here is an example plot of the cumulative distribution function for q = 0.3: 1-j- 08 ■ 0.6 • y ’ 0 . 4 - 0.2 ■ 0 0.2 0.4 0.6 0 8 1 1.2 1.4 1.6 1.8 x Figure 4.77 - Bernoulli law B (cumulative distribution function) 374/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic The Bernoulli function has therefore for expected mean (average) choosing p as the probability of the event of interest: A t = E(X) = Y^Pi x i = p ■ 1 + (1 ~ p) ■ 0 = P (7.226) i and for variance (we use the Huygens theorem proved above): V(X) = a 2 = E(X 2 ) - E(X) 2 = p - p 2 = p(l - p) = pq (7.227) The modal value M 0 of the Bernoulli law depends on the values of p or q. So we have (it could be obvious for the reader): M 0 = 0 p < q M 0 = {0,1} ^p = q (7.228) M 0 — 1 q > 0 [ Remark For sure the Bernoulli distribution has no specific median value M e \ J 7 . 6.3 Geometric Distribution The geometric law Q(N) or "Pascal’s law" consist in a Bernoulli trial, where the probability of success is p and that of failure q = 1 — p are constant, that we renew independently until the first success. Remember that during our presentation of the Bernoulli law we have deduce an extension to N such that: P(N, k) = p k (l - p) N ~ k = p k q N ~ k Therefore the probability to get the first success k — 1 after N trials is: G(N) = p( 1 - p) N ~ l = pq N ~ l \ 1 with NeW. (7.229) (7.230) As you can see, greater is N, smaller is the probability G(N). This can be seem non-logic but in fact it is! Indeed in the sentence " the probability to get the first success after N trials ", you must not forget that it is written after and not during. Therefore for sure... the probability to have — 1 failures followed by 1 success will be always be smaller when N increase (have a look the figure a little bet further below for p = 0.5 can help to understand). This law has for expected mean: +oo +oo +00 p = e (x) = j2piXi= E ^" ljV= J2p( 1 -p) n ~ 1n = pJ2( 1 ~p) n ~ 1n ( 7 - 231 ) i N = 1 N = 1 N= 1 info @ sciences. ch 375/5785 4. Arithmetic EAME v3. 5-2013 However, the last relation can also be written: +OQ 1 E = n , 2 (7.232) (! - 7) 2 Indeed, we proved in the section of Sequences and Series during our study of geometric series that: E? fe k = 0 1 - q n+1 i -q Taking the limit n — >• +oo when we get: (7.233) +oo 1 E? fc = i (7-234) IS 1 - 7 because 0 < q < 1. Then we just derivate both members of equality with respect to q and we get: +OC -I (7.235) k = 1 This done let us continue. We have then the average number of trials A" it takes to get the first success (or in other words, the expected rank - number of expected trials - to see the first success): +oo +oo P P E(X) = £ NP(X = im= £ Npc"' 1 = N=0 N=0 A 7/ P P (7.236) Now we calculate the variance and reminding once again (Huygens theorem): V(X) = E(X 2 ) - E(X) 2 (7.237) So let’s start by calculating E(A^ 2 ): +oo +oo +oo E(A 2 ) = E N 2 P(X = N)=pJ2 N 2 q N ~ l = p E N ( N - 1 + l)q N=0 N=0 N=0 +oo +oo = pE N (N - 1 )q N ~ 1 + p E N Q N ~ 1 N=l N=0 N-l (7.238) The last term of this expression is equivalent to the expected mean calculated previously. Thus: +oo 1 p y Nq N ~ x = - (7.239) 0 P It remains to calculate: +oo p y N (N-l)q N - 1 (7.240) N = 1 376/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic We have: +oo +oo p Y N(N - 1 )q N ~ 1 =pqY N ( N - l h N ~ 2 (7.241) JV=1 N = 2 But deriving the following equality: +oo 1 = ^ (7.242) We get: + QO 2 £*(*-!),*-’= (7.243) Therefore: +oo pq Y N(N — 1 )^ 2 = pq N=2 (1 -qy = pq -3 = pi 2 q p2 (7.244) Thus: E(X 2 ) = ^ +A (7-245) Finally when it comes to ranking the expected variance of the first success (i.e.: the variance expected number before the first successful trials): V(X) = a 2 = E(X 2 ) - E(X ) 2 = ?§ + --! = (7.246) p2 p p2 p2 The modal value is easy to get because we need to find the value of N that maximize the definition of the geometric law: pq N ~ 1 (7.247) and we hope that it is immediate to the reader that this is satisfy when N = 1 therefore: M 0 = pq l ~ l = pq° = p (7.248) Now let us determine the median M e to finish. For this, by definition we know we must have: M e N= 1 +00 E N=M. Yw N 1 = E pq n 1 = °- 5 (7.249) But we can rewrite: M e +OO +OO -1 -1 y pq 7V - 1 = Y pq iV - 1 = pq M *~ l Y q N = P q Me l — - — = pq M °~ X - = q M ^ = 0.5 fYl N= Me ^0 1 - 7 P (7.250) Therefore (in base 10): log (g Me_1 ) = (M e - 1) log(g) = log(0.5) (7.251) info @ sciences. ch 377/5785 4. Arithmetic EAME v3. 5-2013 Finally base on our definition of the median we get: log(0.5) M e — — - - f 1 log {q) (7.252) Now we determine the cumulative function of the geometrical law. We start from: G(N) = pq N ~ 1 (7.253) Then we have by definition the cumulative probability of that the experience is successful in the first N trials: +oo +oo P( X < N) — 1 — J2 M j ~ l = 1 ~P Y j=N+l j=N + 1 with N being for sure an integer of values 0, 1, 2, .... We write: j — l = n + k=>k = n — j + 1 We then have for the CDF: +oo +oo +oo (7.254) (7.255) -Too p(x <n) = i- p J 2 ( i N+k = i - p Y v N< i k = 1 - pq n Y v k = 1 - pq n Y v k k= 0 k= 0 fe=0 k= 0 (J 256) = 1 — pq N i -q = i - (i - q)q ,N i -q = i -q N ^Example: You try late at night and in the dark, to open a lock with a bunch of five keys, without attention, because you are a little tired (or a little tipsy ...) you will try each key. Knowing that only one key will work, what is the probability of using the right key at the N - th test? The solution is: N—l (7.257) Plot of the mass function and cumulative distribution function for the Geometric distribution with parameter p = 0.5: 378/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Figure 4.78 - Geometric law Q (mass and cumulative distribution function) 7.6.4 Binomial Distribution We come back now to our Bernoulli experiment. More generally, any particular AMuple con- sisting of k successes and of N — k failures will have for probability (within a sampling with replacement or without replacement if the population is large ... in a first approximation): P(N, k) = p k ( 1 - p) N ~ k = p k q N ~ k (7.258) to be drawn (or appear) whatever the order of appearance of successes and failures (the reader will have perhaps notice that this is a generalization of the geometric distribution, just write k = 1 to find the geometric distribution back). But we know that the combinatorial determines the number of iV-tuples of this type (the number of ways to order the appearance of failures and successes). The number of possible arrange- ments is, as we proved it (see section Probabilities), given by the binomial coefficient (we recall that the notation in this book does not comply with ISO standard 31-11): C u N\ k!(N — k)! (7.259) So as the probability of obtaining a given series of k successes and N — k failures is always the same (regardless of the order) then we have just to multiply the probability of a particular series by the binomial coefficient (this is equivalent to a sum ) such that: p w *> = c ? p = 1 ' P)N "* = <7 ' 260) to get the total probability to obtain any of these possible series (since each is possible). info @ sciences. ch 379/5785 4. Arithmetic EAME v3. 5-2013 Remark This is equivalent to the study of a sampling with simple replacement (see Probabilities) with constraint on the order or to the study of a series of successes and failures. We will use this relation in the context of the queuing theory or reliability (see section Industrial Engineering). Note that in the case of large populations, even if the sampling is not with replacement it can be considered as with... \ y Written in another way this gives the "binomial function" (or "binomial law"), also known as the following distribution function: B(N, k ) = Cf/(1 - p) N ~ k = C*p k q N ~ k = (Tj p k q N ~ k (7.261) and sometimes also denoted by ft(n,p) with a lowercase n or uppercase N (it does not really matter...) and can be calculated in the English version of Microsoft Excel 11.8346 using the BINOMDIST ( ) function. We sometimes say that the binomial law is not exhaustive as the size of the initial population is not apparent in the expression of the law. Remark The Binomial distribution is named "Symmetric Binomial Distribution" when p = 0.5. ^Example: We want to test the alternator of a generator. The probability of failure at solicitation of this material is estimated to be 1 failure per 1, 000 starts. We decided to test 100 starts. The probability of observing one failure in this test is: N\ To (7.262) B(N = 100, k = 1) = C fc V(l -P) = k\{N — k)\ 100 ! 1 ! (100 — 1 )! V 1000 1000 99 2* 9% We obviously have for the cumulative distribution function (very useful in practice for suppliers batch control or reliability as we will see in the section of Industrial Engineering!): N Y J C%p k (l-p) N - k = l (7.263) k = 0 Indeed, we have proved in the section of Calculus the "binomial theorem": n (x + y) n = •£ C n k x k y"- k = 1 (7.264) k=0 380/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Therefore: N E C A‘(1 - pX* = (P + (1 - Plf = 1" = 1 (7.265) k = 0 Instead of calculating such cumulated probability rather than hand it is better to use Microsoft Excel 11.8346 (or any other widely known software) with the function CRITBIN0MO to not bother to calculate these type of values. The expected mean (average) of £>(X, k) is given by: N N N N Epf) = £p.P=£ P(X = k)k = E <7V 0 - pf-“k = E C t V(l - p) N ~ k k k= 0 k= 0 k= 0 k= 1 (7.266) But having: Cf = = (VA-1 1 (7.267) We finally get: tv JV E(A') = E C»V(1 -p) N ~ k = N E (1 - P) AT-fc fc=l fc=l AT = Xp £ C&y-^l - p)^- 1 )-^- 1 ) = Xp £ Ck “V(l - p) (JV_1)_fe fc=l AT-1 E /c=0 (7.268) = Xp(p + (1 — p)) = Xp that gives the average number of times that we will get the desired outcome of probability p after A r trials. The mean of the binomial distribution is sometimes noted in the specialized literature with the following notation if r is the potential number of possible expected outcomes in a population of size n: E(X) = Xp = X- n Before calculating the variance, we need to introduce the following equality: (TV- 1)! 5 7!(JV - 1 - a) Indeed, let us proof this relation using the previous developments: N ~ 1 (N — 1)' Ar_1 -p-q- - “ = (X- l)p opi (iV-1)! (7.269) (7.270) ^ s!(X — 1 — s)! (iV-1)! 5=0 N-l =5 s(s-l)KV-l-s)! p , q N ~ 1 - N-l = E s „N— 1— s El (s — 1)!(X — 1 — s)! p q = (N - l) E (X — 2)! ,s „A7— 1— s v- ,t) (« - 1)!(V - 1 - »)! W-2 (N — 2)' Ar_2 = (x - 1) E = (n - i ) p £ £ W 2 ' i=o z 7)- j=o N—2 = (X - 1)P £ cf 7=o p q (7.271) info @ sciences. ch 381/5785 4. Arithmetic EAME v3. 5-2013 We recognize in the last equality the cumulative distribution function that is equal to 1. There- fore: TV- 2 (N - 1 )p J2 Cf- 2 p>q N - 2 - j = (N — l)p ■ 1 = (N — 1 )p (7.272) j = o We start now the (long) calculation of the variance of the binomial distribution by using the previous results: TV TV N\ N\ N v(k) = E K (k - = E j ; - N V f- = g mN _ k)V p q (k - nY N 2 N\ k\(N-k)\ TV „2 " 2 M k\(N-k)\ TV pV ‘ + E *!(JV-fe)! AM t iV! p k q N ~ k N\ TV ]^ 0 2kfJ, k\(N -k)\ TV p k q N ~ k = V k 2 “fro k\(N — k)\ = y ^2 “fro *K*-*)! tv-i + n 2 y Pq +P t 0 k\(N-k)\ N\ N p k q N ~ k TV N\ 2 ^ 0 k k\(N-k)\ p k q N ~ k N\ P \ N - k + ^W- k -2^k m _^ p k q N ~ k =1 =/i=Np W-> + ?- 2?.jte w gL w pk q N ~k _ y r ,-P + N T.( s + 1) ,}* , 1)! “ fe_1 s=0 s\(N — 1 — s)! TV-1 = -p 2 + Np E l> + 1) _„[* , 1) E, pV w = -p 2 + Np E S !(JV - 1 - s)! N ~\ .,{ N ~ 1)] .. p’qW-V- 3 + NpJ2 TV- 1 =o s\(N-i- s y/ TV- 1 a A tl 1 . (iv- 1)-.) =S »!((JV - 1 - S )! P q TV— 1 = -p 2 + NpY, sC'f "Vg (JV_1)_s + NpY, -sCf-y g -i s (TV-iJ-s) s=0 s=0 TV-1 — /i 2 + Np(N — l)p + iVp ^ sC y 1 p s q {N ^ ^ s = 0 = -p 2 + Np 2 (N - 1) + Np • 1 = -p 2 + Np 2 (N — 1) + Np = —p 2 + N 2 p 2 — Np 2 + Np = —p 2 + p 2 — Np 2 + Np = Np{ 1 — p) = Apg Finally: V(fc) = cr 2 = iVp(l — p) = Apg (7.273) (7.274) The standard deviation of the binomial distribution is sometimes noted in the specialized liter- ature in the following way if r is the potential number of expected outcomes in a population of size n and s the not expected one: a = J Npq = Jn—— = — VNrs V nn n (7.275) 382/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Here is a plot example of the binomial 13(10, 0.5) distribution and cumulative distribution func- tion: Figure 4.79 - Binomial law B (mass and cumulative distribution function) It could be useful to note that some employees in companies normalize the calculation of the mean and standard deviation to the unit of N. Then we have: Np Iv = P V = iv5 V(A ’> = = 4p(l p) CT = P(1 ~P) N (7.276) ^Example: In a sample of 100 workers, 25% are late at least once a week. The mean number and variance of late people is then: E(fc) = Np = 100 • 0.25 = 25 a k = sj Np(l - p) = y/ 100 ■ 0.25(1 — 0.25) ^ 4.33 Normalized to the unit of N this give us: E( A ) =P=0.25 &k/N — \p{l-p) N o 7.277 ) (7.278) Let us now calculate the mode. Because the function is discrete we can not use derivative. Then we will use a hint. We compute the ratio: B(N, k ) B{N, k + 1) (7.279) info @ sciences. ch 383/5785 4. Arithmetic EAME v3. 5-2013 and we check that this ratio is > 1 for every k < k* and < 1 for every k > k*, for some integer k* that is the k value corresponding to the modal value. Let a k = P{ X = k). We have: NkN-k a k = C£p k q _ (■ ~iN k+1 N—k — 1 a k+ 1 — _iP q (7.280) We calculate the ratio Ofc+i Note that: Ok + 1 = Cj? +1 p k+1 q N k 1 a k C^p k q N ~ k N\ (k + l)!(iV — (k + 1))! p k +^ q N-k-i m pk q N—k k!(N — k)! n — kp n — k p k + 1 q k + 1 1 — p (7.281) What is important now is to analyze: np — kp n — k p k + 11 — p k — kp + 1 — p (7.282) depending on the value of k. First we can see that this ratio is equal to 1 and therefore we have to modes if: np — kp k — kp + 1 — p = 1 np — kp = k — kp + 1 — p (7.283) That is to say if k = np + p — 1 = p{n + 1) — 1. This can be seen as the limit point of interest. But don’t forget we are looking for the k such that the ratio is less than 1. So we try two values: k = [p{n + 1) — 1] + 1 k = [p(n + 1) — 1] — 1 Injecting this in our ratio we see that k = [p(n + 1) — 1] + 1 = k* = M 0 (7.284) (7.285) Is the value we were looking for. Finally there are two possible values for the modes. A unique modal value and a double modal value. As we know the median value, is the value of X such that we have: x J2 Cj?p k (l - p) N ~ k = 0.5 (7.286) k = 0 But we did not yet found an easy proof to determine M e in the general case for the Binomial law. To conclude on the binomial law, we will develop now a result that we will need to build the McNemar paired test for a square contingency table (and as it is squared it is also dichotomous) that we willl study in the section of Theoretical Computing. 384/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic We need for this test to calculate the covariance of two-paired binomial random variables (this is why the covariance is non- zero): cov(rii,7ij) = E (niTij) — ’E{n i ){nj) (7.287) As they are paired, this means that: Tii + rij = n Pi = 1 - Pj 1 ~Pi= Pj (7.288) And therefore: co v(ni,nj) = E(nirij) - E (ra*)(7ij) = E(?r i n i ) - (■ npi)(npj ) = E^npij) - n 2 pj{ 1 - pj) (7.289) Now comes the difficulty that is to calculate E (npij). To calculate this term it does not exist to our knowledge other methods than looking for the law of the pair (sometimes we can get around such approach). In this case it is a multinomial distribution (more precisely: trinomial) that it is customary to write in the following way by construction: M{n,k,l,pi,pj,l~pi - pj ) = n\ pTp/ i 1 ~ Pi ~ Pj) n—nj—rij rii\nj\{n — rii — nj)\ 1 3 that we will write now temporarily as following to condense the expression: n\ M(n,kJ,p, q ,r) = m — I - J¥ p q r k A^n—k—l (7.290) (7.291) So we have a trinomial law as we are looking for the number of times we have the event k, the event l and neither one nor the other (so the rest of the time). n\ We then get: = o + o + o + Y, H k=0,l^0 1=0, k^O k=0,l=0 *>1.1>1 If k > 1 and l > 1, we obtain: , , n\ n\ kl p k q l r n-k-l n\ p k q l r n-k-l (7.292) k\l\(n — k — l) n{n — 1 ){n — 2)! k\l\{n — k — l)\ k(k -!)!((( -l)!(n- k - l)\ (k - !)!(/ - l)!(n - k - l)\ = n(n — 1)- (n — 2)! (7.293) (k — l)\(l — l)!(n — k — l)\ Now we use this relation in the joint mean: E(JfcZ) = V kl pr.p k q l r n ~ k ~ l fc >tT>i k\l\(n - k - l)\ P q — n (n — 1) V kl ~ 2 ^ ! - ^ k(k-mi-mn-k-i)\ pk q l r n-k-l (7.294) = n{n — 1 )pq ^ (n — 2)! KKn.KKn [k — !)!(/ — l)!(n — k — l)\ pk—lgl—l^n—k—l info @ sciences. ch 385/5785 4. Arithmetic EAME v3. 5-2013 Consider now the special case where n is equal 2. We then have: (n - 2)! E p k ~ 1 q l ~ 1 r n ~ k ~ l !<*<„,! <l<n (k-l)l(l-mn-k-iy/ y p k-i j-i r 2-k~i i<fc<it<K2 ( k - !)!(/ - 1)!(2 — k — l)V (7.295) 0 ! ( 1 - 1 )!( 1 - 1 )!( 2 - 1 - 1 )! p i-i q i-i r 2-i-i = x where the sum is reduced to only one term because if we take for example k — 2, l = 1 we get a negative factorial at the denominator. For n equal 3, the result will be also 1, and so on (we will assume to simplify... that some numerical examples will suffice to convince the reader of the generality of this property because it is very boring to write with ETpX). Then we have: (ti — 2 V E(«) = n(n - 1)„ £ FWAW pk-1 ql-1 r n -k- 1 = 77,(n — 1 )pq (7.296) So in the end: COv(77i, 71 j) = E(77j? Ij) ~ 77? Pj ( 1 - Pj ) = 77(t7 - 1 )pipj - 7l 2 piPj = -TTiPiPj (7.297) And this is the major result we will need for the study of the McNemar test. 7.6.5 Negative Binomial Distribution The negative binomial distribution is applied in the same situation as for binomial distribution, but it gives the probability to have E failures before the Ti'-th success when the probability of success is p (or, at contrary, the probability to have R success before E - th failure when the failure probability is p). We will introduce this important distribution with an example. Consider this for this purpose the following probabilities: P(success) = 0.2 = p P(failure) = 0.8 = 1 — p = q (7.298) Imagine that we have done 10 trials and we wanted to stop at the third success and that the 10th trial is the third successful one! We will write this: [1 2 3 4 5 6 7 8 9] 10 (7.299) Now we highlight what we will consider as the successes (R) and failures (E): [1 2 3 4 5 6 7 8 9] 10 (7.300) [eereeeree] r 386/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic We have also 7 failures and 3 successes. In an experiment where the draws are independent (or can be considered as independent...), the probability that we get this particular result is: (0.8) 7 (0.2) 3 (7.301) But the order of successes and failures in the bracketed part is irrelevant. So as we have 2 success among the 9 trial in brackets it follows that the probability of obtaining the same result regardless of the order is then using combinatorics: <7f (0.8) 7 (0.2) 3 = (0.8) 7 (0.2) 3 =* 0.0603 (7.302) Which corresponds to the probability of having 7 failures before the 3rd success (or otherwise seen: 3 successes after 10 trials). This can be written with Microsoft Excel 14.0.6123 or later (7 + 3 = 10 trials, 7 failures including 3 successes): =NEGBIN0MDIST (7 , 3 , 0 . 2 , 0) =0 . 0604 We now generalize the prior-previous notation by writing the number of failures k, N the total number of trials and p the probability of success: NB(JV, k, p) = C%zl_ 1 q k p N ~ t = ( N ~ k _ ~ ^ q k p N ~ k (7.303) However, there are several possible notations because the previous relation is not very intuitive to practice as may have perhaps noticed the reader. Thus, if we denote k as the number of successes and not the number of failures, then we have (the most common writing way from my point of view among a lot of others notations) the following probability of having N — k success before having a number k of failures with a probability p (or of failures before having k successes ... it’s symmetrical!): m(N,k,p) = C++~ V = (^r i V* w -‘ (7.304) therefore the comparison with the formulation of the binomial distribution proved above is then perhaps more obvious! However, it is more common to write the previous relation by removing N because for the moment the notation is still perhaps not very clear. For this, we note R the number of successes , E the number of failures, p the probability of success and then comes the probability of having R success after E failures (this is perhaps much more clear...): NB( J R, E,p) = C^ iq {E+R) - E p E ={ E e R 1 1 ) pEqR (7305) We sometimes find this last relation with another relation using explicitly the binomial coeffi- cient: NB (R,E,p) E + R-l E — 1 E + R-l R q R p E q R p E (■ E + R !) ! R E (E-1)\R\ q 1 (E + R 1)! r e R\(E- 1)! q 1 (7.306) info @ sciences. ch 387/5785 4. Arithmetic EAME v3. 5-2013 The cumulative probability that we have at least R successes before the E - th failure is obviously given by: N E R = 0 (E + R- V E- 1 q R p E (7.307) Remark The name of this law comes from the fact that some statisticians use a definition of the binomial coefficient with a negative value for the expression of the function. Since this is a rather a rare notation, we do not want lose time to prove the origin of the name. You should also know that this law is also known as the "Pascal’s law" (as well as the geometric distribution ...) in honor of Blaise Pascal and also as "Polya’s law" in honor of George Polya. V / ^Examples: El. A long-term quality control has enabled us to compute the estimator of the proportion p of nonconforming pieces as equal to 2% at the output of a production line. We would like to know the cumulative probability to have 200 pieces before the 3rd defective piece appears. With Microsoft Excel 14.0.6123 or later it comes using the negative binomial distribution: =NEGBIN0M . DIST (200 , 3 , 0 . 02 , 1) =77 . 35% E2. To compare with the binomial distribution, we can ask ourselves what is the cumu- lative probability of drawing 198 non-defective parts from 201 using Microsoft Excel 14.0.6123 or later: =BIN0M . DIST (198 ,201,0.98,1) =76 . 77% Therefore we see that the difference is small. In fact the difference between the two laws is in practice so small that we then use almost always the binomial law (but you should still be careful with this choice!). As usual, we will now determine the variance and mean of this law. Let’s start with the mean of having R successes when the 77-th failure appears knowing that the probability of a failure is p. For this we will use a very simple and ingenious trick (all art was thinking about it...). If we return to our initial example: [1 2 3 4 5 6 7 8 9] 10 [rrerrrerr] e and we rewrite this example as follows: [1 2 3 4 5 6 7 8 9 10] [R R E R R R E R R E] =x x =x 2 =x 3 (7.308) (7.309) 388/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic We then notice that the third success R of the first notation can be decomposed into the sum of three geometric random variables such that: R — X± + X 2 + ... + X n (7.310) With in the case of this particular example n — 3 corresponding in fact to E — 3. So quite generally the sum of n random geometric variables always gives a negative binomial distribution if the probability p is equal for each geometric variable!!! Anyway... as we have proved the expression of the mean and variance of the Geometric law as (thus giving us the mean rank of the first failure): E(X) = - V(X) = ^ = 4 (7.311) p p z p z since the random variables are independent and of same parameters then it comes for the nega- tive binomial the mean of the rank of 77-th failure using the property of the mean: E(X) = E(Xl + X 2 + ... + X n ) = E(Xt) + E(X 2 ) + ...E(X n ) = nB(X) = n- = E- P P (7.312) And therefore for the variance of the negative binomial distribution: V(X) = W(X 1 + X 2 + ... + X n ) = Y(X 1 ) + V(X 2 ) + ...V(X n ) = nV(X) = n — ^ = e\ p z p z (7.313) So the mean and variance of the rank (corresponding to the number of trials N or from another point of view: the mean number of successes by the simple subtraction X — E ) to have the 77-th failure is then to summarize: E(X) = - P _ (7.314) V(X) = E 1 -^- p z Thus, putting 77 = 1, we fall back on the mean and variance of the geometric distribution! Now, let Y be the random variable representing the number of trials before the 77-th success. We then have the following expressions for the variance and the mean that are very common in the literature (these expressions of mean and variance corresponds to what we can find for the negative binomial law in Wikipedia for example): E(y) = E(X - 1) = EOf) - E(l) = E(I) - 1 = — - 1 = p W{Y) = W{X - 1) = W{X) + V(l) = Y{X) = E ^~ p) 77 — p 77(1 — p) p P (7.315) info @ sciences. ch 389/5785 4. Arithmetic EAME v3. 5-2013 ^Example: What is the expected number (mean) of trials we can expect before we fall on the third non-conforming part, knowing that the probability of a non-conforming part is 2%? E(A') = - = A = 150 (7.316) p 2% and for the standard deviation: a= ^=fW 1=s5 - 732 <73i7) Like always the reader will find below a plot example of the distribution and cumulative distri- bution function for the negative binomial law of parameters NB(iV, k. p) = P(N, 3, 0.6) based on the example of the begging, but where the only difference is the probability of success where we he have taken 60% instead of 20%. Thus, there is 21.6% of probability of having the third success after the third successive trial (i.e. 0 trials more than the number of successes), 25.92% of probability of having the third success after the fourth successive trial (i.e. one trial more than the number of successes), 20.7% of probability of having the third success after the fifth successive trial (i.e. two trial more than the number of successes) and so on...: Figure 4.80 - Negative Binomial law NB (mass and cumulative distribution function) The above distributions are truncated to 9 (corresponding to 12 trials) but they theoretically continue indefinitely. What particularly distinguishes the binomial and geometric distributions from the negative binomial are the tails of the distribution. The binomial negative distribution has an important place in a special regression technique that we will see in the section of Theoretical Computing. 390/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 7.6.6 Hypergeometric Distribution We consider to approach this function a simple example (but not very interesting in practice) that is this of an urn containing n balls where m are black and the other m white (for several impor- tant examples used in the industry refer to the sections of Industrial Engineering or Numerical Methods). We take successively, and without replacement in the um, p balls. The question is to find the probability that among the p balls, there is k that are black (in this statement the order of the drawing does not interest us). We often talk about "exhaustive sampling" with the hypergeometric distribution because at the opposite of to the binomial distribution, the size of the lot which is the basis for the sampling will appear in the law. Remark This is also equivalent to an non-ordered sampling without replacement (see section Probability) with constraint on the occurrences sometimes called "simultaneous sam- pling". We will often use the hypergeometric distribution in the field of quality and reliability where the black balls are associated to items with defects and the white one to items without defects. The p balls can be chosen among n balls in C™ ways (thus representing the number of possible different outcomes) with as reminder (see section Probability): /ATI % “ K n! p\ p\(n — p)\ (7.318) The k black balls can be chosen among the m black in O'" 1 ways. The p ~ k white balls can be chosen in C^Z™ ways. There is therefore ways to have k black balls and p — k white balls. The searched probability is therefore given by (we will see an alternative notation in the section of Industrial Engineering): ml (n — m)\ H(n,p, m, k ) = s~im rrn—m ^ p—k 0 2 k\{m — k)\ (p — k)\((n — m) — (p — k))\ n\ p\ (n — p ) ! (7.319) and is said to follow a "Hypergeometric distribution" (or "Hypergeometric law") and can be obtained fortunately directly in Microsoft Excel 14.0.7 153 with the function HYPGEOM . DIST ( ) . info @ sciences. ch 391/5785 4. Arithmetic EAME v3. 5-2013 ^Examples: El. We want to develop a small computer program of 10,000 lines of code (n). The return on experience shows that the probability of failure is one bug per 1, 000 lines of code (or 0.1% of 10, 000 lines) that corresponds to the value m. We test about 50% of the functionality of the software randomly before sending it to the customer (corresponding to the equivalent of 5, 000 lines that is p). The probability of observing 5 bugs (k) is then given with Microsoft Excel 14.0.715: HYPGE0MDIST (k , p ,m , n) =HYPGE0MDIST (5 , 5000 , 1'/.* 10000 , 10000) =24 . 62% E2. In a small single production of a batch of 1, 000 pieces we know that 30% on average are bad because of the complexity of the pieces and by return on experience from a previous similiar manufacturing. We know that a customer will randomly draw 20 pieces to decide whether to accept or reject the lot. He will not reject the lot if he finds zero defective pieces on the 20. What is the probability of having exactly 0 defective? =HYPGE0MDIST (0 , 20 , 300 , 1000) =0 . 073% and as we require a null draw drawing result, the calculation of the hypergeometric dis- tribution simplifies manually to: 700 699 698 697 681 1000 999 998 997‘“ 981 0.073% (7.320) It is not forbidden to make direct calculation of the mean and variance of the hypergeomet- ric distribution, but the reader will without much trouble guess that this calculation will be ... relatively indigestible. Then we can use an indirect method that is much more interesting! First, the reader will perhaps, even certainly, have noticed that experienced of the hypergeomet- ric distribution is a series of Bernoulli trials (without replacement of course!). So we will cheat by using initially the property of linearity of the mean. We define for this purpose a new variable corresponding implicitly in fact to the experience of the hypergeometric distribution (a sequence of k Bernoulli trials!): k X = Y,Xi (7.321) i = 1 where X t is the success of obtaining at the ?’-th drawing a black ball (either 0 or 1). But, we know that for all i the random variable X, follows a Bernoulli function for which we have proved in our study of the Bernoulli distribution that E(2Q) = p. Therefore, by the property of linearity of the mean we have (caution! here p is not the number of balls, but the probability associated with an expected event!): EPO =E(X>V) (7.322) In the Bernoulli trial, p is the probability of obtaining the desired item or event (for reminder...). In the hypergeometric distribution what interests us is the probability of a black ball (which are 392/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic in quantity m, therefore with m' white balls) compared to the total amount of n balls. And the ratio obviously gives us this probability. Thus, we have: rji rji /i = E(X) = kp = k = k— (7.323) m + m' n where k is the number of trials (do not confuse with the notation of the initial statement where it was by the variable p\). The mean gives then the average number of black balls in a drawing of k balls among n, where m are known as being black. The reader will have noticed that the hope of the hypergeometric distribution is the same as the binomial distribution! To determine the variance, we use the variance of the Bernoulli distribution and the following relation proved during the introduction of the mean and covariance at the beginning of this chapter: V(X + Y) = V(X) + V(y) + 2cov(X, Y) + V(X) + V(y) + 2(E (AT) - E(X)E(Y)) (7.324) k Recalling that we have X — X t , we get: i=l V(A') = V X = £ V(X 0 + 2 •£ (E(M - E(A.)E(A',)) \z=l / i = 1 i=l<l<j<n However, for the Bernoulli law, we have: m m' (7.325) V(X) =pq = mm mm m + m! m + m' ( m + m') 2 n 2 (7.326) Then we first already get: V 51 Ail = k mm = k mm \i = 1 (m + m’) 2 n 2 (7.327) The calculation of E(XjXj) requires a good understanding of probabilities (this will be a good refresh!). The mean E (XiXj) is given (implicitly), as we know, by the weighted sum of the probabilities that two events occur at the same time. However, our events are binary: either it is a black ball (1) or it is a white ball (0). So all terms of the sum without two consecutive black balls consecutively will be null! The problem is then to calculate the probability of having two consecutive black balls and it is thus written: p{ x^j = i) = P((x t = i) n (Xj = i)) = p(Xi = l )P Xi= 1 p(x j = l) m m — 1 (7.328) m' + m ( m ' + m) — 1 So we finally have: E(AjXj) = — m m — 1 m! + m {m! + m) — 1 (7.329) info @ sciences. ch 393/5785 4. Arithmetic EAME v3. 5-2013 Therefore: co y{Xi,Xj) = EiXiXj) -E(X,)(X i ) = m m — 1 m! + m (m' + m) — 1 ^m' + m m -mm (7.330) [m' + m) 2 {m' + m — 1) Finally (using the result of Gauss series seen in the section of Sequences and Series): V(A) = £ V(A, : ) + 2 £ (E(A^Aj) - E(A*)E(A,)) 2=1 2=l<l<j<n = k = k mm —mm x , l_2 1 ( m ' + m) 2 {m' + m) 2 (m' + m — 1) i=1 ^^ . <n mm + 2 - -mm (k - l)k (m' + m) 2 (m' + rn)' 2 (rn' + m — 1) 2 kmml{ml + m — 1) — mm'k(k — 1) kmm'(m' + m— 1 — k-\-l) (7.331) (m' + m)' 2 (m' + m — 1) kmm'(m' + m — k) . m = k (■ m ’ + m) 2 {m ' + m — 1) m' m' + m — k (m' + m) 2 {m' + m — 1) m' + m m! + m m' + m — 1 m' + m — k n — k = kpq = kpq- r rr ? 1 "* ; m' + m — 1 where we have used the fact that: is composed of: n y>>v(A-,.x,) i<j C k fk\ _ H k - !) (7.332) (7.333) terms as correspond to the number of ways there are to choose a pair (i,j) with i < j. Because: n — k V(X) = kpq n — 1 We can write: cr = \Jkpq\ n — k n (7.334) (7.335) In the specialized literature, we often find the variance written in the following way by noting the expected event r and the non-expected event s: n — k r s n — k krs(n — k) klrs V(A) = kpq = k n — 1 mnn — 1 n 2 (n — 1) n 2 (n — 1) (7.336) so with l = 7i — k. This last notation will be very useful in the section of Theoretical Computing for our study of the Mantel-Haenszel test. Furthermore, we see that in: a = \[kpq\ In — k n — 1 (7.337) 394/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic there is the standard deviation as the binomial distribution, at the difference of a factor that is noted: fpc = J — ? (7.338) V n — 1 the we found often in statistics and is named "finite population correction factor". Here is an example plot example of the distribution function and cumulative distribution for the Hypergeometric function of parameters (n,p, m, k) = (10, 6, 5, k): Figure 4.81 - Hypergeometric law H (mass and cumulative distribution function) We will prove now that the Hypergeometric distribution tends to a binomial distribution since this property is used many times in different sections of this book (especially the section of Industrial Engineering). To do this, we decompose: rvm rrn—m H{n,p,m,k)= ' k r p ~ k We then get: m\ (n — m)\ C™Cp_™ _ kUjn - k)\ (p - fc)!((n - m) - ip - k))\ r^n % n\ p\(n — p)\ m i (■ n ■ - m)\ p\(n — p)\ k\(m - - k)\ (p — k)\((n — m ) — (p — k)) ! n\ p\ m (n — m)\ (n — p)\ k\(p — k)\ (m — k)\ (p — k)\((n — m) — ip — k))! n! - a k m (■ n — m)\ (n — p)\ V /' \ p (■ m — k)\ ( p — k)\((n — m) — (p — k))\ n\ (7.339) (7.340) info @ sciences. ch 395/5785 4. Arithmetic EAME v3. 5-2013 For the second term: m 1-2-3 ■ ■ m = m{m — 1 — k + 1) (m — k)\ 1 • 2 ■ 3 • • (m — k ) For m — > +oo (...) all the terms are of then of the order of m. Then we have: m{m — l)...(m — k + 1) = m k (7.341) (7.342) For the third term, a identical development to the previous one provides (for sure we need that also n — » Too (...)): {n — m)\ [p — k)\({n — mn) — (p — k))\ = (■ n - m) p ~ k (7.343) And for sure... we can discuss therefore about n — m when both terms tends to infinity... Ditto for the fourth term: (n — p)\ = n~ p ni (7.344) In conclusion we have: rvm rvn—m ^p-k = a C n n.m — >-+oo P k m k {ii — m) p k n p (7.345) We change the notation by writing p (the number of individuals drawn) as being N. We get then: /Tm/Dn-m = C; N m k [n — m) N k Cft n,m-s-+oo n N (7.346) We make another change of notation by writing b the black balls and w the white balls. We get then: brin—b k^N-k = c i v b k (n-b) N k „ M b k w N etc —k Cm n,b-H-oo k n N = C? (7.347) 'jV Ib,u—r-r^ \i PI Finally, we note p the proportion of black balls and q that of white balls in the lot. We then get: ,jv (' np) k (nq) N ~ k s-qnp /~in—np °/c °AT-/c = Ci = C?(n P nnqy- k n-" = C£p k q N-k-N riNkN-k Cft rip- S-+00 n N (7.348) We find out the binomial distribution! In practice, it is common to approximate the hyperge- ometric distribution with a binomial distribution when the ratio of the number of individuals from the total number of individuals is less than 10 In practice, Monte Carlo simulations with testing adjustments (see late in this chapter) have shown that the hypergeometric distribution could be approximated by a Normal distribution (very important case in contingency statistical tests that we will study in the section of Theoret- ical computing) if the following three conditions are met simultaneously: , n — m m , n k > 9 k > 9 k < — m n — m 10 (7.349) 396/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Thus graphically and approximately...: Figure 4.82 - Conditions of application of the approximation by a Normal distribution Thus: H(k,m,n ) = AT(p,a] . - , , m / krs(n — k) = A/| k — , 1 v . . - ,m kmm'(n-k) . = AH k-,J (7.350) n ’ y n 2 (n — 1) / \ n V — 1) 7.6.7 Multinomial Distribution The multinomial distribution (also called because it involves several times the binomial coeffi- cient) is a law applicable to n distinguishable events, each with a given probability, which occur one or more times and it is not necessarily ordered. This is a frequent case in marketing re- search and that will be useful to build the statistical McNemar test that we will study much later (see section Theoretical Computing). We also use this law in quantitative finance (see section Economy). More technically, consider the space of events fl = {1, 2, .... m} with the probabilities P({i}) = Pi, i — 1,2, ..., n. We take n times with a given element of with replacement (see chapter Probability) with the probability p t , i = 1, 2, ..., n. We will search what is the probability of such a non-necessarily ordered the event 1, k \ times, event 2, k 2 times and this on a sequence of n drawings. Remark This is equivalent to the study of a sampling with replacement (see section Probability) and constraints on the occurrences. So without constraints we will see with an example that we fall back on a sampling with simple replacement. V 1 W info @ sciences. ch 397/5785 4. Arithmetic EAME v3. 5-2013 We saw in the section of Probabilities, that if we take a set of events with multiple outcomes, then different combinations of sequences we can get taking p selected elements among n is given by: C " = n\ We have therefore: p p\(n—p)\ C n = n! ' kl hlin-hy. different ways to get k\ times a given event. Thus an associated probability of: p(n,h) = cj&pNr * 1 = cip k y{i - Pl ) n - ki (7.351) (7.352) (7.353) Now comes the particularity of the multinomial distribution!: there are no failures in contrast to the binomial distribution. Each "pseudo-failure" can be considered as a subset draw of k 2 items from the n — k\ remaining elements. Thus the term: n /ci n n—ki C n kl vfql (7.354) will be written on the whole experience if we consider a particular case limited to two types of events: so with: (n — A- 1 ) ! s~yn — k\ U k 2 ~ k 2 \((n - ki) - k 2 )\ (7.355) (7.356) which gives us the number of different times to get k 2 times a second event because in the whole sequence of n elements k \ of them have already been taken so have now only n — k\ remaining on which we can get the k 2 desired. These relations then show us that this is a situation where each event probability is considered as a binomial (hence its name ...). So we have in the case of two sets of f-uples: r<n fci /~m—ki k2 _ rm ki k 2 _ U kiPl L 'fc 2 P 2 — Pi P‘2 ~ n\ (n — ki)\ k k P 1 P 2 ki\(n - h)\ k 2 \((n - h) - k 2 )\ n\ (7.357) k\\k 2 \ n — ki — k 2 v\'vf and because: we get: —ki — k 2 = —n P — —n kl V k2 — — v kl v k2 - h\k 2 \ o\ Pl p 2 - h\k 2 \ Pl (7.358) (7.359) 398/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic and we see that the construction of this distribution therefore requires that: Y,Pi = 1 J2 k i =Tl i i (7.360) Thus, by induction we have the probability M. we were looking for and called the "Multinomial function" and using previous relation given by: in the spreadsheet software Microsoft Excel 11.8346, the term: m nw (7.361) (7.362) is named "multinomial coefficient" and is available under the name of the function MULTINOMIAL ( ). In the literature we also find this term sometimes under the following re- spective notations: k\ + k 2 + ••• + k n k\ i k 2 , . . . , k m m k i ■ k ‘2 . . . . , k m (7.363) Theorem 4.33. We will show now that the multinomial distribution is effectively a probability distribution (because we could doubt ...). If this is the case, as we know it, the sum of the probabilities must be equal to 1. Proof 4.33.1. Recall that in the chapter of Calculus we proved that (binomial theorem): n (x + v) n =Y, Cix k y"- k k = 0 Now do a little bit of notation: (x 1 +x?r= Y.c£x\'xr k ' = v nl ki=0 = 0 ~ k i) ! ^ki^n-ki 1 J/ 2 (7.364) (7.365) fci=0 and this time a change of variables: (*i + x,r = y cix\'xr k ' = e nl k i=0 _n h\k 2 \ ,y k l rfk 2 X 1 x 2 fcl=0 (7.366) This last relation (which is a special case of the two terms "multinomial theorem") will be useful to us to show that the multinomial distribution is effectively a probability distribution. We also take the special case with two groups of drawing: M = nl i — 1 n = n\ i — 1 ki\k 2 \ p k M 2 (7.367) info @ sciences. ch 399/5785 4. Arithmetic EAME v3. 5-2013 which can is also written by the construction of the multinomial distribution: M = nl ; p\'pT k ' ki\(n — k\)\ and therefore, the sum must be equal to the unit such that: (7.368) nl ^ 0 hKn-ki)\ p ki p n-ki = 1 To check this we use the multinomial theorem shown above: (pl+P2) n = Y, n\ fel =o k ^ P k lP2 2 (7.369) (7.370) However, by construction of the multinomial sum of probabilities is unitary, we have effectively: &>,+!*)"= (1)" = 1= £ n\ M k\\{n — ki)\ p kl P2~ kl (7.371) □ Q.E.D. ^Examples: El. We launch an unbiased die 12 times. What is the probability that all 6 faces appear the same number of times (not necessarily consecutively!) that means twice for each: M = n\ Up 12 ! ki _ i 6 "^6 n n^ i=i n 2! ' =1 12 ! 2 6 6 12 = 0.34% (7.372) i = 1 i = 1 where we see well that m is the number of success groups. E2. We launch an unbiased die 12 times. What is the probability that a single unique face appears 12 times (hence the "1" appears 12 times, or the "2" or the "3", etc.): *= 1 i = 1 (7.373) So we end up with this last example known a being a binomial distribution result. 400/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 7.6.8 Poisson Distribution For some rare events, the probability p is very small and tends to zero. However, the average value np tends to a fixed value as n tends to infinity. We start from a binomial distribution with mean /i = np that we will assume finite when n tends to infinity. The probability of k successes after n trials is (Binomial distribution): S(n, k) = C n k p\ 1 - p) n ~ k = Clp k q n ~ k =r k ) p k q n ~ k (7.374) m By writing p — — (where m will be temporarily the new notation for the mean according to n H = np), this expression can be rewritten as: (7.375) By grouping the terms, we can put the value under the form: ^ . m f m B{n,k) = — (l k\ v n n\ (n — k)\n k (l--) V n ' (7.376) We recognize that when n tends to infinity, the second factor of the product has for limit e -A1 (see Functional Analysis). The third factor, since we are interested to the small values of k (the probability of success is very small), its limit for n tending to infinity is equal to 1. This technique of passing to the limit is sometimes named in this context: the "Poisson limit theorem". So we get the "Poisson distribution" (or "Poisson law"), also sometimes named the "law of rare events" therefore given by: k (7.377) which can be obtained in Microsoft Excel 1 1 .8346 with the function POISSON ( ) and in prac- tice and the specialized literature is often indicated by the letter u. It is indeed a probability distribution since using the Taylor series (see chapter Sequences And Series), we show that the sum of the cumulative probabilities is: +oo ..k +oo . k E = e - " E fr = e-V 1 = 1 (7.378) k = 0 k = 0 info @ sciences. ch 401/5785 4. Arithmetic EAME v3. 5-2013 Remark We will frequently encounter this distribution in different sections of the book such as in the study of preventive maintenance in the section of Industrial Engineering or in the section of Quantitative Management for the study of queuing theory (the reader can refer to them for interesting and pragmatic examples) and finally in the field of life and non-life insurance. V / Here is a plot example of the Poisson distribution and cumulative distribution function with parameter p = 3: Figure 4.83 - Poisson law V (mass and cumulative distribution function) This distribution is important because it describes many processes whose probability is small and constant. It is often used in the "queing theory" (waiting time), acceptability and reliabil- ity test and statistical quality control. Among other things, it applies to processes such as the emission of light quanta by excited atoms, the number of red blood cells seen under the micro- scope, the number of incoming calls to a call center. The Poisson distribution is valid for many observations in nuclear and particle physics. The mean (average) of the Poisson distributions is (we use the Taylor series of the exponential): +oo +oo k +oo k — 1 H = E (k) = ^(/b k ) = = e ”^ 51 (k _ ip = e _/ W = I 1 (7.379) and gives the average number of times that you get the desired outcome. This result may seem confusing .... the mean is expressed by the mean?? Yes must simply not forget that it is given since the beginning by: H = rip (7.380) 402/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic The variance of the Poisson distribution function is itself given by (again we use the Taylor series): 2 = V(fc) = E [k-^]P t = T,lk~^% k = 0 k = 0 K - a always with: (7.382) H = np The important fact for the Poisson distribution is that the variance that is equal to the mean is name the "equidispersion property of the Poisson distribution". This is a property often used in practice as an indicator to identify whether the data (with discrete support) are distributed according to a Poisson distribution. The theoretical laws of statistical distributions are determined assuming completion of an infi- nite number of measurements. It is obvious that we can only perform in practice a finite number N. Hence the utility to establish correspondence between the theoretical and experimental val- ues. For the experimental values we obviously obtain only an approximation whose validity is, however, often accepted as sufficient. Now we will prove an important property of the Poisson distribution in the field of engineering and statistics that we name "stability by addition". The idea is as follows: Let X and Y be two independent random variables with Poisson distribution of respective parameters A and //. We want to ensure that their sum is also a Poisson distribution: X + Y = P A+M (7.383) See this: k k P(X + Y = k) = ]T P [(X = i) n (Y = k-i)\ = Y J P{X = i)P( Y — k — i) (7.384) info @ sciences. ch 403/5785 4. Arithmetic EAME v3. 5-2013 7^o il (k-i)\ because the events are independent. Then we have: k k yi - A ..k—ipH P(X + Y = k) = J2 p ( x = i) p ( Y = k-i) = Y, ' ^ 2=0 2=0 * k\ , t _, However, by applying the binomial theorem (see section Calculus): (7.385) k u k E = ec‘aV-< = (a + „)' i=o z! ( K — i=o (7.386) So in the end: p— (A+/i) P(x + y = fc) = (A + / u) fc — — (7.387) and therefore the Poisson distribution is stable by addition. So any Poisson distribution where the parameter is verbatim indefinitely dividable into a finite or infinite sum of independent Poisson distributions. 7.6.9 Normal & Gauss-Laplace Distribution This characteristic is the most important function of distribution in the field of statistics follow- ing a famous theorem named the "central limit theorem", which as we will see later, permits to prove (among other things) that any sum of independent identically distributed random variables with a finite mean and variance converges to a Laplace-Gaussian function (Normal distribution). It is therefore very important to focus your attention on the developments that will be presented right now! Let start from a binomial function and make tender the number of trials n to infinity. If p is set from the beginning , the mean ji = np also tends to infinity, furthermore the standard deviation cr = npq also tends to infinity. Remark The case where p varies and tends to 0 while keeping fixed the mean has already been studied during the study of the Poisson function. v i ; f If we want to calculate the limit of the binomial function, it will then be necessary to make a change of origin, which stabilizes the mean, to 0 for example, and a change of unit change that stabilizes the standard deviation to 1 for example. Let us now denote by P n (k) the binomial probability of k success and let’s see first how P n (k) vary with k and calculate the difference: Pn{k + 1 ) — P n (k) p k+l q n - k - 1 P k q n ~ k ' p k q n ~ k f (n — k)p \ \(k + l)q ) P n (k) np — k — q (. k + 1 )q (7.388) 404/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic We conclude that P n (k) is an increasing function of k, as np — k — q is positive (for n. p and q fixed). Too see it, juste take a few values (of the right term of the equality) or to observe the graph of the binomial distribution function, remembering that: H = np (7.389) As q < 1 it is therefore evident that the value of k close to the mean f t = np of the binomial distribution is the maxima of P n (k). On the other hand the difference P n (k + 1) — P n (k) is the increase rate of the function P n (k). We can then write: ap,m = m + 1) - pm A k (k + l)-k ' ’ as being the slope of the function. We now define a new random variable such that its average is zero (negligible variations) and its standard deviation equal to the unit (this will be a centered-reduced variable in other words). Then we have: x = k — np Vnpq (7.391) Then we also have with this new random variable: (k + 1) — np k — np yjnpq Ax = [(k + 1) — np] — (k — np) (7.392) yjnpq yjnpq yjnpq Let us write denote F(x) as being P n (k ) calculated using the new random variable with zero mean and unit standard deviation which we seek the expression when n tends to infinity. Let go back to: ^Pn(k) _ np-k-q _ — ( k-np)-q A k \k + 1 )q A ’ (k + l)q A ’ (7.393) To simplify the study of this relation when n tends to infinity and k to mean // = np, multiply both sides by npq/ npq : A P n (k) npq np — k — q npq —{k — np) — q npq A k yjnpq (k + 1 )q yjnpq " ' (k + 1 )q yjnpq We rewrite now the right-hand side of this equality. Then we get: — (fc — np) — q np _ [-(fc - np) - q)np (k + 1) yJFpq nU (k + l)y/rm nU And now let us rewrite the left term of the prior-previous relation. We then get: A P n (k) npq A P n {k) npq A P n {k) npq Ak -yjnpq (k + 1) — 1 yjnpq [(k + 1) — np] — (k — np) yjnpq A P n (k) npq [(k + 1) - np] — (k - np) ^Jnpq^Jnpq yjnpq AP n (fc) _ A P n (k) [( k + 1) - np) - (k - np) Ax yjnpq (7.394) (7.395) (7.396) info @ sciences. ch 405/5785 4. Arithmetic EAME v3. 5-2013 After passing to the limit for n tending to infinity we have in a first time for the denominator of the second term of the prior-previous relation: [—(k — np) — q)np (k + 1 )y/npq Pn(k) (7.397) the following simplification: (k + 1 ) yjnpq = kJnpq v n— 5>+oo v (7.398) Thus: -P — np) + q]np k^/npq Pn(k) k — np k^Jnpq npP n (k ) qnp k^Jnpq Pn{k) (7.399) and in a second time, taking into account that the considered values of k are then in the neigh- borhood of the mean np, we get: and: Thus: and as: k — np np kyjnpq npk — np ^ k — np k y/npq n— h-oo yjnpq (7.400) qnp ^ qnp ky/npq n->+oc npy/npq —[(k — np) + q]np ky/npq d 0 y/npq n->+oo n— >+oo (7.401) (7.402) P n (k) := F(x) (7.403) n— >-+oo where F(x) represents (awkwardly) for the few lines that follow, the density function as n tends to infinity. Finally we have: d F(x) dx —xF{x) This relation can also be rewritten rearranging the terms: 1 F(x) d F(x) —xdx (7.404) (7.405) and by integrating both sides of this equality we obtain (see section Differential And Integral Calculus): In PP)) = -y + P (7.406) 406/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic The following function is a solution of the above equation: x F(x) = Ae 2 Effectively: (7.407) x X In Ae 2 = ln(A) + In e 2 = c te + — — = x X Y + c te (7.408) The constant is determined by the condition that: -Too / F(x)dx (7.409) which represents the sum of all probabilities, that mus be equal to 1. We prove for this that we need to have: A = 1 Proof 4.33.2. We have: -Too —X +oo 2 dx = x +oo ^7 dx = \/2 / e "d z (7.410) (7.411) So let us focus on the last term of this equality. Thus: -Too -Too 1= / e~ x2 dx = 2 / e~ x2 dx (7.412) since e x is an even function (see section Functional Analysis). Now we write the square of the integral as follows: J 2 = 4 lim R — >4-oo R \ / R e~ x2 dx j ( J e~ y2 dy R R = 4 lim R — >+oo e -(F+y 2 )dxdy (7.413) .o o and make a change of variable passing in polar coordinates, therefore we also use the Jacobian in these same coordinates (see section Differential And Integral Calculus): I 2 = 4 lim R — >4-oo e r rdrd6) =4 lim J i?— > 4-oo (* 2 R \ e r rdrd(f) R = 4— lim e r rdr = 27T — e . 7T 2 R — H-oo 0 0 -Toon (7.414) = 2vr ( 0 + - ) = 7 r info @ sciences. ch 407/5785 4. Arithmetic EAME v3. 5-2013 Therefore: I = \[l T (7.415) x 2 By extension for e 2 we have: / = A -1 = \fFn (7.416) □ Q.E.D. We thus obtain the "standard Normal distribution" noted as probability density function (noted with the capital letter F that can unfortunately lead to confusion in the present development with the notation of the cumulative distribution function... we apologize...): x 2 (7.417) which can be calculated in Microsoft Excel 1 1 .8346 with the function NORMSDIST ( ) . For information, a variable following a Normal centered reduced distribution is by tradition often noted Z ("Zentriert" in German). Returning to non-normalized variables: x = k — np k — ii yjnpq a (7.418) so we get the "Gauss-Laplace function" (or "Gauss-Laplace law") or also named "Normal dis- tribution" given in the form of probability density in this book by: P(k, /i, a) = A f(/i, a) ik-p) 2 — )=e 2 a 2 g\[Fk (7.419) The cumulative probability (distribution cumulative function) to have a certain value k is obvi- ously given by: P(k < x) — $(x) _ (fc ~ ^) 2 e 2 a 2 d k (7.420) Here is a plot example of the distribution and cumulative distribution function for the Normal law with the parameters example (/i, a) = (0, 1) that is therefore the standard centered reduced Normal distribution: 408/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 045- Figure 4.84 - Normal law JV (mass and cumulative distribution function) This law governs under very general conditions, and often encountered, in many random phe- nomena. It is also symmetrical with respect to the mean (this is important to remember!). We will now show that // well represents the mean (or average) of x (this is a bit silly but we can still check ...): +oo +oo ( X — nY E(A^) = / xf(x)dx = aV2Y e 2cr 2 dx (7.421) We put: We then have: u = x — fl a (7.422) +OC +oo U E(A") = / xf(x)dx = cry/2n (au + ii)e 2 adu +c» U +CXD U = o- a\/2Y aue 2 du + cr- cr /j,e 2 du (7.423) +oo U +oo U = CT V2Y ue 2 du + /i / e 2 du Let us now calculate the first integral: -Too U U~ J = ue 2 du = -Too = 0 - 0=0 -Too (7.424) info @ sciences. ch 409/5785 4. Arithmetic EAME v3. 5-2013 So we finally get: +oo U 2 E{X, = 7S tl I e ~ TAu = 7S^ — OO (7.425) Remarks Rl. The reader might find confusing at first that the parameter of a function is one of the results that we seek of this same function (as for the Poisson distribution). What bothers is to put in practice such a thing. In fact, everything will be more clear when we will discuss later in this chapter the concepts of "likelihood estimators". R2. It could be interesting to know for the reader that in practice (finance, quality as- surance, etc.) it is common to have to calculate only mean only positive values of the random variable which is then naturally defined as "positive mean" and given by: +oo E+(X) = cr xe 1 f x — /i x 2 2 cr dx (7.426) We will see a practical example of this last relation in the section Economy during our study of the theoretical model of speculation of Louis Bachelier. Also we will prove now (...) that a is the standard deviation of X (in other words to prove that V(A) = cr 2 ) and for this we recall that we had prove that (Huyghens relation): V(X) = E(X 2 ) - E(X) 2 We already know that at the level of the notations we have: E(A) = y => E(X) 2 = y 2 then we first calculate E(A 2 ): +oo 1 / X — H E(A^ 2 ) = / x 2 f(x)dx = cr x 2 e ^ V a da; Let y = (x — fi) / \f2o that therefore leads us to: +oo E(A" 2 ) = / x 2 f(x)dx = a^ir (yV2a + y) 2 e y2 dx +oo +oo +00 - f ye~ y dy + -^= f e~ y d y 7T / \/7T / +oo And we know that (already proved above): +oo / ye~ v dy = 0 e y dy = (7.427) (7.428) (7.429) (7.430) (7.431) 410/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic It remains to calculate therefore only the first integral. To do this, we proceed by integration by parts (see Differential and Integral Calculus): b b [ f(t)g'(t)dt = f(t)g(t) \ b a - [ f(t)g(t)dt leads us to: +oo +oo y y 2 e y2 d y = / y (ye y2 ) d y = -y- h +°° o + /' ir dy= 2 7 T Then we get: E(A" 2 ) = -^=a 2 — - + 2/iaJ-O + ~^=y/pi = y 2 + cr 2 a/ 7 r 2 V 7 r a /pi And finally: E(X) = E(X 2 ) - E(X) 2 = (/i 2 + cr 2 ) — /i 2 = cr 2 (7.432) (7.433) (7.434) (7.435) An additional signification of the standard deviation of Gauss-Laplace distribution is a measure of the width of the distribution as (this can be checked only with the aid of integration by using numerical methods) for any non-zero mean and standard deviation we have (thanks to John Cannin for the ETpXfigure): 68 . 2 % O G <D G a" <D £ 95% 99. 7% 2.1% 13.6% 34.1% 34.1% 13.6% 2.1% -3-2-10 1 2 3 Standard deviations Figure 4.85 - Sigma intervals for the Normal distribution An additional signification of the standard deviation of Gauss-Laplace distribution is a measure of the width of the distribution as (this can be checked only with the aid of integration by using numerical methods) for any non-zero mean and standard deviation we have: The width of the interval has a great importance in the interpretation of uncertainties measure- ment. The presentation of a result like N ± a has for signification that the average value has about 68.3% chance (probability) to lie between the limits of TV — cr and N + cr or has approxi- mately 95.4% to lie between the limits of N — 2a and N + 2a etc. info @ sciences. ch 411/5785 4. Arithmetic EAME v3. 5-2013 This concept is widely used in quality management in industrial business especially with the Six Sigma methodology (see Industrial Engineering) which requires a mastery of 6 around each side of the mean (!) of the manufacturing (or anything else whose deviation is measured). The second column of the table can easily be obtained with Maple 4.00b (or also with the spreadsheet software from Microsoft). For example for the first line: >S : =evalf (int (1/sqrt (2*Pi) *exp(-x~ 2/2) ,x=-l. . 1)) ; and the first row of the third column: >(1-S)*1E6; If the Normal distribution was not centered, then we just would write for the second column: >S : =evalf (int (1/sqrt (2*Pi) *exp(- (x-mu) ~ 2/2) ,x=-l . . 1) ) ; and so on for any deviation and mean we will then obtain exactly the same intervals! ! ! V / The Gauss-Laplace distribution is also not only a tool for data analysis but also for data gener- ation. Indeed, this distribution is one of the largest used in the world of multinationals that use statistical tools for risk management, project management and simulation where a large number of random variables are to be controlled. The best examples of applications use the softwares CrystalBall or Palisade @Risk (this last one being my favorite...). In this context of application (project management), it is also very common to use the sum (task duration) or the product of random variables (customer uncertainty) following Gauss-Laplace distributions. We will see now how to to calculate this: 7.6.9. 1 Sum of two random Normal variables Let X, Y be two independent random variables. Suppose that X follows the distribu- tion and that Y follows the distribution /V"(/r 2 , cr 2 ). Then the random variable Z = X + Y has a density equal to the convolution product of f x and f y . That is to say: +oc (x — ^i) 2 (s-X-/i 2 ) 2 — OO — OO which is equivalent to the joint product (see Probabilities) of the probabilities of occurrence of the two continuous variables (remember the same kind of calculation in discrete form!). To simplify the expression, make the change of variable t = x — fi \ and let us write a = 412/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic As: (s-y - y 2 2 ) 2 = - (-s + y + /tJ) 2 = (-s + y + /i 2 ) 2 = (t + a) 2 (7.437) we get after a hard to guess rearrangement trick: +oc t 2 (t + a) s +oo 2 \ 2 2 2 2 aaf \ <j{a <72 <7 t H I + <7 /*(«) = 27071(72 2o i e 2ct 2 d t = 2717x72 27\7 ■ 2^2 2 a +oo 27071(72 ~2(7 2 2 \ 2 acrf x (7t+ (7 2^1 d t We write: 7t ac y\ u = 7 d U 7 r -r — r- => dt = V27 x7 2 — V 2(7 x7 2 d t V 2(7 i(7 2 CT Then: a +oo at + 0(77 (7 a +oo /*(s) = 2(7 2 27T(7 i (72 — C Knowing that (proved above): 2 ^2 d t = y/2- "2(7 2 / e -“ 2 dM 7TCT +00 e “ 2 d u = \/7r (7 Z dt (7.438) (7.439) (7.440) (7.441) and: our relation becomes: a 2 — (— s + /ti + y 2 ) 2 — (s — /xi — y 2 )‘ : (7.442) (7.443) We recognize the expression of the Gauss-Laplace distribution (Normal law) of mean ji \ + y 2 and standard deviation <7 = \J 7 2 + cr 2 . Therefore, X + V' follows the distribution as written by the physicist (both argument have same units): J\f (/ii + y 2 , ^/cr 2 + 7^j (7.444) info @ sciences. ch 413/5785 4. Arithmetic EAME v3. 5-2013 and as noted by most mathematicians, statisticians: A/" (/Ji + fi'2 , (7.445) The fact that the sum of two Normal distributions always give also a Normal distribution is what we name in statistics "stability of the sum" of the Gauss-Laplace distribution (Normal law). We will find such properties for other distribution that will be discussed later. So as well as for the Poisson distribution, any Normal distribution whose parameters are known is verbatim indefinitely divisible into a finite or infinite number of independent Normal distri- bution that are summed as: Af(/x,cx) = J2J\f (-, — ) (7.446) \n n J Remark The families of stable distribution by the sum is an important field of study in physics, fi- nance and statistics called "Levy alpha-stable distribution". If time permits, I will present the details of this extremely important study in this chapter. V I 1. 6.9.2 Product of two random Normal variables Let X, Y be two real independent random variables. We denote by f x and f Y the cor- responding densities and we seek to determine the density of the variable Z = XY (very important case, particularly in engineering). Let F denote the density function of the pair (X, Y). Since X, Y are independent (see section Probabilities): f(x,y) = fx(x)f Y (y) (7.447) The distribution function of Z is: F(Z) = P(Z <z) = P(XY < z) = ff D f(x,y)dxdy = fj ^ f x (x)f Y (y)dxdy (7.448) where D = {(x,y)\xy < z}. D can be rewritten as a disjoint union (we do this for anticipating in the future change of variables a division by zero): D = D x U D 2 U D 3 (7.449) with: Di = {(x, y) G M 2 |x?/ << zAi>0} D ‘2 = {(x, y) G M 2 |x?/ < 2 A x < 0} (7.450) D 3 = {(x, y) G M 2 |x?/ < 2 A x = 0} 414/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic We have: F ( z ) = If fx{x)f Y {y)dxdy + [[ fx(x)f Y (y)dxdy + ff fx{x)f Y {y)dxdy (7.451) JJ D\ JJ D2 77 /);5 " ; v " =0 The last integral is equal to zero because D 3 is of measurement (thickness) zero for the integral along x. We then perform the following change of variable: x = xt — xy (7.452) The Jacobian of the transformation (see section Differential and Differential Calculus) is: 1 0 - t/x 2 l/x 1 \x\ (7.453) Thus: F(z) = z +00 fx( x)f Y (t/x) \x\ z 0 dxdt, fx{x)f Y {t/x ) \x\ z +00 dxdf = fx(x)f Y (t/x) dsdf \x\ — 00 0 — 00 —00 —00 —00 (7.454) Let f z be the density of the variable Z. By definition: Z F(Z < 1)3 F(z) = [ f z {t)dt (7.455) On the other hand: Z +CX) F(z) = fx(x)f Y {t/x) dxdf \x\ (7.456) — CO —OO as we have seen. Therefore: (7.457) What is a bit sad is that in the case of a Gauss-Laplace distribution (Normal distribution), this integral can only be easily calculated numerically ... it is then necessary to use Monte Carlo integration type methods (see section Theoretical Computing). However according to some research done on the Internet, but without certainty, this integral may be calculated and give a new distribution called "Bessel distribution". info @ sciences. ch 415/5785 4. Arithmetic EAME v3. 5-2013 7.6.9.3 Bivariate Normal Distribution If two Normal distributed random variables are independent, we know that their joint probability is equal to the product of their probabilities. So we have: P = P1P2 1 \Z27T(Ji e 0 - m 1 ) 2 2 erf 1 a/27TCT2 0 ~ M2) 2 2 erf {x - Ml ) 2 (x - M2) 2 I e 2 erf 2 erf 27rcrier 2 (7.458) Now comes an approach that we will often find in the follow developments: to generalize simple algebra models, you have to think in a Linear Algebra way! Therefore we are left with two vectors involving a scalar product: (x-/ri) 2 (x-fi 2 y p = PiP 2 = 2 l T G\G 2 2crf 2er. 2 Xi 2 \ x 2 2 = 2i rcr \G 2 2o\2 2 (7.459) But we can do even better because for the moment there is no added value to this notation! Effectively a subtle idea is to involve the determinant of a matrix (see section Linear Algebra) and the inverse of this same matrix in the previous relation: x 1 x 2 Mi M 2 P=RPo = 27071(72 'X l ~ M l 2crf x 2 - M 2 2(Ti2 2 Xi x 2 Mi M2 27r(crfcr 2 — 0 ■ 0) 1 / 2 0 1 o'!. 2tt cr{ U .0 cr 2 1/2' xi - Hi x 2 — M2, \ T erf 0 0 a 2 Xl £2 Mi M2 Xi - /i[ 2:2 — M2 (7.460) We thus find a particular case of the variance-covariance matrix. In the field of the bivariate Normal distribution is it is customary to write this last relation in the following form: f(X i,X 2 ) = 27r|E| 1 /2 -;:0-/4 Ts ^ e 2 (7.461) If we make a plot of this function we get: 416/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Figure 4.86 - Plot of the bivariate Normal function in MATLAB™ 002 0.015 001 0.005 0 10 or another one (not with the same values) with corresponding projections: Figure 4.87 - Plot of the bivariate Normal function with pgfplots Now consider the important case in engineering, astronomy and quantum physics by returning info @ sciences. ch 417/5785 4. Arithmetic EAME v3. 5-2013 to the following notation: {x - /ii ) 2 (x - n 2 y P = f{X u X 2 ) = 2 a\ 2(Jn (7.462) 27071(72 and by focusing on to the iso-lines such that for any pair of values of the two random variables, we have: {x-fi i ) 2 (x~n 2 y 2 ( 7 ? 2m 27TCricr 2 By doing some very basic algebraic manipulations, we get: (x-/ii) 2 (x-/i 2 ) 2 2 = C te Thus: and we get: 2 ( 7 ? 2(72 {x-Hxf (x-/i 2 ) 2 = In (27T(7i<7 2 C te ) + 2 ( 7 ? {x - /Zi ) 2 2(7 o = In 2(7? In 277(7! (7 2 C te + 27T(7 1 (7 2 C te (x - /i 2 ) 2 2(7? In 27T(7 1 (7 2 C te = 1 (7.463) (7.464) (7.465) (7.466) We recognize here the analytical equation of an ellipse (see section Analytical Geometric) ! A plot of iso-lines with fi = ( ME = 25 O' 0 9 give us: -4 -2 0 2 4 e 8 10 12 Figure 4.88 - Plot of the iso-lines of the bivariate Normal function (non-correlated case) But now recall that when we got: f{X u X 2 ) = 1 1 (x-Y) e £ 2tt|E| 1 /2 (7.467) 418/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic the variance-covariance matrix was zero everywhere except on the diagonal, implying verbatim the independence of the two random variables. We can obviously guess that the generalization is that the variance-covariance matrix is non-zero in the diagonal and then the two random variables are correlated. Consequently, the iso-lines become with values such as /i ,£ = 10 5 5 5 Figure 4.89 - -2 0 2 4 6 8 10 Plot of the iso-lines of the bivariate Normal function (correlated case) So the correlation rotates the axis of the ellipses! Note that we have therefore: 1 £ = Oil 0" 12 On 0"l2 021 022. 012 0*22. =* S ' 1 = 0U022 °12 U cr n 021 -&12 a. 22 and thus verbatim: (7.468) |S| — o‘nO r 22 cr i2 (7.469) Recall that we saw during our study of the correlation coefficient that (well... normally... the R notation for the correlation is used only if the variances are estimated but as it is the most common notation in practice we will still us it...): (T12 — COv(Xi,X 2 ) — c(Xl, X 2 ) — Rx 1 ,X 2 (T ll (T 22 Thus: 1 _ 1 0n — 0h2 1 Oil ~ O' 12 _2 ^-2 ^.2 (J ll (J 22 ~ a l2 ?n ^22 . ^11^22(1 - RI 2 ) P'21 0*22 . (7.470) (7.471) and the exponent of the exponential of the bivariate Normal takes a form that we can found very info @ sciences. ch 419/5785 4. Arithmetic EAME v3. 5-2013 often in the literature: x\ - Hi\ T ( 1 X 2 — jJ-2/ \C r ll 0 ‘22(l — -^ 12 ) T Xi ~ Hi &11 —&12 2 P'21 °22 xi - Hi X 2 ~ H 2 X2 H2/ V cr ll cr 22(l — -^12) a 11 -Rl2&ll&22 — Rl2Vll&22 ^22 X\ - Hi X 2 - H 2 o\ 2 {xi - Hi) 2 + <xh(x 2 - H 2) 2 - 2Ri2(Tii(J22{xi - Hi)(x 2 - H 2 ) (7.472) 1-i? 2 12 ^1^2(1 -^2) Xi - Hl \ 2 , / X2 - H 2 V (Til ) V 0-22 ) 2 R 12 Xl — Hi \ ( x 2 — H2 (Til (72 2 Note that if the random variables are centered reduced, then we have: 1 E” 1 = 1 - r\ 2 and thus the exponent of the exponential of the bivariate Normal distribution becomes: 1 (7.473) E -1 = 1 - m ~ 2 ~ [x\ + x\- 2 R 12 XiX 2 ] 12 (7.474) Thus, the density function of the bivariate Normal centered reduced distribution will be written: Now consider the important case in engineering, astronomy and quantum physics by returning to the following notation: x i + x 2 — 2R12X1X2 f(xi,x 2 ,R) = 2x(l - R\ 2 ) 1 2(1 - R\ 12 e 2 e R11X1X2 1 - Rl " V 27r (! - RI 2 ) ^ R 2 12 ) R11X1X2 = A/"(0, 1 — R 12 )J\f' (0, 1 — Ri 2 )e R 11 X 1 X 2 = Af(0, l)Af(0, 1) (l - f4) 2 e 1 - R i2 12 (7.475) 12 Thus, we can see that a bivariate Normal reduced centered distribution function normal can be constructed by the multiplication of two Normal centered and reduced distributions themselves multiplied by a term that depends mainly on the correlation parameter. The latter term includes the nature of the dependence of the two random variables and provides the link between the marginal distributions (both Normal centered and reduced) to obtain the joint bivariate Normal distribution. If necessary (this can be very useful in practice), here is the Maple 4.00b code to plot a bivariate Normal function (taking the last example) even if it is also very simple to do with a spreadsheet software like Microsoft Excel: >f :=(x,y,rho ,mul ,mu2 , sigmal , sigma2) -> (1/ (2*Pi*sqrt (sigmal*sigma2* (l-rho~2) ) ) ) *exp( (-1/ (2* (l-rho~2) ) ) * ( ( (x-mul) / sqrt (sigmal) ) ~ 2 + ( (y-mu2) / sqrt (sigma2) ) ~2 420/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic -2*rho* ( (x-mul)/sqrt (sigmal) ) * ( (y-mu2) /sqrt (sigma2) ) ) ) ; >plot3d(f (x ,y , 5/ sqrt (10*5) ,3,2,10,5) ,x=-4 . . 10 ,y=-4. . 9 ,grid= [40,40] ) ; and for the plot with the iso-lines: >with(plots) : >contourplot (f (x,y , 5/ sqrt (10*5) ,3,2,10,5) ,x=-4 . . 10,y=-4. . 9,grid= [40,40] ) ; and we can check that it is a probability density function by writing: >int (int (f (x,y ,5/sqrt (10*5) ,3,2,10,5) ,x=- infinity . . . +inf inity) ,y=-inf inity . . .+inf inity) ; or calculate the cumulative probability between two intervals: >evalf (int (int (f (x,y , 5/sqrt (10*5) ,3,2,10,5) ,x=-3 . . . +4) ,y=-5 . . . +2) ) ; 7.6.9 .4 Normal Reduced Centered Distribution The Gauss-Laplace distribution is not tabulated as we must then have so many numerical tables as possible values for the mean //. and standard deviation a (which are the parameters of the function as we have seen it). Therefore, by a change of variable, the Normal distribution becomes the Normal reduced cen- tered distribution more often named the "standard Normal distribution" where: 1. "Centered" refers to subtracting the mean /i to the measures (thus the distribution function is symmetric to the vertical axis). 2. "Reduced" refers to the division by the standard deviation a (thus the distribution function has a unit variance). By this change of variable, the variable k is replaced by the reduced centered random variable: k* = A — ^ (7.476) a If the variable k has for mean /i and standard deviation a then the variable k* has a mean of 0 and standard deviation of 1 (this last variable is usually denoted by the letter Z). Thus the relation: P(k, /T cr) (k~ /Q 2 — 2<t 2 (7.477) info @ sciences. ch 421/5785 4. Arithmetic EAME v3. 5-2013 is therefore written (trivially) more simply: r P(k*, 0,1) (7.478) which is just the explicit expression of the reduced centered Normal distribution ("standard Nor- mal") often denoted Af(0, 1) which we will find very often in the sections of physics, finance, quantitative management and engineering ! Remark Calculate the integral of the previous relation for an interval can not be done accurately formally speaking. One possible and simple idea is then to express the exponential in a Taylor series and then be integrated term by term of the series (making sure to take enough terms for convergence!). V / 7.6.9.5 Henry’s Line Often in business it is the Gauss-Laplace (Normal) distribution that is analyzed but com- mon and easily accessible software like Microsoft Excel are unable to verify that the measured data follow a Normal distribution when we do the frequency analysis (there are no default integrated tool allowing users to check this assumption) and we do not have the original ungrouped data. The trick then is then to use the reduced centered variable that is build as we have see above with the following relation: k* = k — ,l (7.479) a The idea of the Henry’s Line is then to use the linear relation between k and k* given by the equation of the line: k- = m = - - ^ <7 <7 (7.480) 422/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic ^Example: Suppose we have the following frequency analysis of 10,000 receipts in a supermarket: Price of receipts Number of receipts Cumulated number of receipts Relative frequencies of receipts [0,50[ 668 668 0.068 [50,100[ 919 1,587 0.1587 [100, 1 50[ 1.498 3,085 0.3085 [150,200[ 1,915 5,000 0.5000 [200,250[ 1,915 6,915 0.6915 [250,300[ 1,498 8,413 0.8413 [300,350[ 919 9,332 0.9332 [350,400[ 440 9,772 0.9772 [400 and + 228 10,000 1 Table 4.25 - Supermarket receipt amount distribution If we now plot this in Microsoft Excel 1 1.8346 we get: Figure 4.90 - Distribution of receipts amount What looks terribly like a Normal distribution, thus the authorization, without too much risk to use in this example the technique of Henry’s line. But what can we do now? Well... now that we know the cumulative frequency, it remains for us to calculate each k* using numerical tables or the WORMS INV ( ) function of Microsoft Excel 1 1.8346 (remember that formal integration of the Gaussian function is not easy...). This will give us the values of the standard Normal distribution Af(0, 1) of these respec- tive cumulative frequencies (cumulative distribution function). So we get (we leave to the reader to take its statistic table or open its favorite software...): info @ sciences. ch 423/5785 4. Arithmetic EAME v3. 5-2013 Upper limit of the interval Cumulated relative frequencies Correspondance for k* of A/"(0, 1) 50 0.068 -1.5 100 0.1587 -1 150 0.3085 -0.5 200 0.5000 0 250 0.6915 0.5 300 0.8413 1 350 0.9332 1.5 400 0.9772 2 - 1 - Table 4.26 - Cumulative relative frequencies to the Henry’s line Note that in the type of table above, in Microsoft Excel, the null and unit cumulative frequencies will generated some errors. You should then play a little bit... As we specified earlier, we have under discrete form: k * = f(ki) = - - ^ (7.481) cr a So graphically in Microsoft Excel 11.8346 we can thanks to our table plot the following chart (obviously we could do strictly a linear regression in the rules of art as seen in the chapter of Numerical Methods with confidence, prediction intervals and other stuffs...): Figure 4.91 - Linearized form of the distribution So thanks to the linear regression given by Microsoft Excel 1 1 .8346 (or calculated by you using the techniques of linear regressions seen in the chapter on Numerical Methods). It comes: k* = f(k ) = - - - = O.Olfc - 2 (7.482) cr cr 424/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic we immediately deduce that: cr = 100 n = 200 (7.483) This is thus a particular technique for a particular distribution! Similar techniques more or less simple (or complicated depending on the case...) exist for others distributions. See now another approximate approach to solve this problem. Let take us again our table for this example: Price Upper limit Center Relative cumulative of receipts of the interval frequencies in % [0,50[ 50 25 6.8 [50,100[ 100 75 15.87 [100.150[ 150 125 30.85 [150,200[ 200 175 50.00 [200,250[ 250 225 69.15 [250,300[ 300 275 84.13 [300.350[ 350 325 93.32 [350,400[ 400 375 97.72 [400 and + - 100 The average is now calculated using the central value of the intervals and sample sizes according to the relation we have seen at the beginning of this section: V — > /V" . _ Ei=i n i x i Ei= 1 n i (7.484) Price Center Relative cumulative Calculation of receipts frequencies in % [0,50[ 25 668 16,700 [50,100[ 75 919 68,925 [100,150[ 125 1,498 187,250 [150,200[ 175 1,915 335,125 [200,250[ 225 1,915 430,875 [250,300[ 275 1,498 411,950 [300,350[ 325 919 411,950 [350,400[ 375 440 165,000 [400 and + - - Sum: 9,772 1,914,500 Average: 1,914,500 ’ ’ - int; no 9,772 The average that we have calculated yet is also quite close to the average obtained previ- ously with the Henry’s line. info @ sciences. ch 425/5785 4. Arithmetic EAME v3. 5-2013 The standard deviation will now be calculated using also the central value of the intervals and sample sizes according to the relation seen at the beginning of this chapter: a = \ N J2 n i(xi -ft) i = 1 N J2 n i i = 1 Price Center Relative cumulative Calculation of receipts frequencies in % [0,50[ 25 668 16,700 [50, 100[ 75 919 68,925 [100,150[ 125 1.498 187,250 [150,200[ 175 1,915 335,125 [200,250[ 225 1,915 430,875 [250,300[ 275 1,498 411,950 [300,350[ 325 919 411,950 [350,400[ 375 440 165,000 [400 and + - 228 - Variance: 8364.16 Standard Deviation: 91.45 The standard deviation that we have calculated yet is also quite close to the standard deviation obtained with the method of the Henry’s line. 7. 6.9.6 Q-Q plot Another way to judge of the quality of fit of experimental data with a theoretical distri- bution (whatever that is!) is the use of a "quantile-quantile plot" or simply called "q-q plot". The idea is pretty simple, it based on the comparison the experimental data relatively to the theoretical data that are supposed to follow a particular distribution. Thus, in the case of our example, if we take the values of the mean (~ 200) and standard deviation (~ 100) obtained with the Henry’s line as theoretical parameters for the Normal distribution, we get: 426/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Price of receipts Upper experimental limit (imposed) Relative cumulative frequencies in % Upper theoretical 50.91 [0,50[ 50 6.80% 50.91 [50,100[ 100 15.87% 100.02 [100, 1 50[ 150 30.85% 149.99 [150,200[ 200 50.00% 200 [200,250[ 250 69.15% 250.01 [250,300[ 300 84.13% 299.98 [300,350[ 350 93.32% 350.00 [350,400[ 400 97.72% 399.90 [400 and + - 100% - Plotted, this gives us the famous Q-Q plot: And of course we can compare the observed quantiles with the supposed theoretical distribution. More the points will be aligned on the line of unit slope and zero intercept origin, the better will be the fit! It’s very visual, very simple and widely used by non-specialists in business statistics. info @ sciences. ch 427/5785 4. Arithmetic EAME v3. 5-2013 7.6.10 Log-Normal Distribution We say that a positive random variable X follows a "log-normal function" (or "log-normal distribution") if by writing: y = ln(x) (7.486) we see that y follows a Normal distribution of mean /j and variance cr 2 (moments of the Normal distribution). Verbatim by the properties of logarithms, a variable can be modeled by a log-normal distribution if it results of the multiplication of many small independent factors (property of the product in sum of the logarithms and stability of the Normal distribution by the addition). The density function of X for x > 0 is then (see section Differential And Integral Calculus): (ln(.r) - /i) 2 f(x) = a) = y=e 2a 2 oxyj 2n (7.487) that can be calculated in Microsoft Excel 11.8346 with the L0GN0RMDIST( ) function or its inverse by LOGINV ( ) . This type of scenario is frequent in physics, in technical maintenance or financial markets in the options pricing model (see the respective sections of the book for various application examples). There is also an important remark with respect to the log-normal distribution further when we will develop the central limit theorem! Let us show that the cumulative probability function corresponds to a Normal distribution if we make the change of variables mentioned above: +oo J f(x) dx o +oo 0 (ln(x) — y) 2 2cr 2 dx -Too (ln(x) — y) 2 — = / -e 2cr 2 dx (7.488) (XV 27T J x o by writing: and (by definition): y = ln(x) =>■ dy dx - <S=> dx = xdy x (7.489) x = e y (7.490) we then get: +oo /(x)dx = cr a/ 27 T J x +oc (ln(x) - y) 2 -e 2cx 2 dx x + r°i Jy -^) 2 a a/ 27 r J x —e 2cx 2 xdy +°° (y - (7.491) cr V2^ e 2cx 2 dy 428/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic So we found again the Normal distribution! The mean (average) of X is then given by (the natural logarithm being not defined for x < 0 we start the integral from zero): +oo +oo (ln(x) — fl ) 2 EpT) = f x/(x)dx = — \ f x—e 2cr 2 dx = — \ i 0 0 0 -t-oo (u ~ Li ) 2 1 r e 2 a 2 +U du +oo (ln(x) — /r) 5 2cr 2 dx ay/2n (7.492) where we performed the change of variable: u = ln(x) x = e u xdu = dx (7.493) The expression: (u - iif 2 (X 2 + u moreover being equal to: 2cr 2 ((w - (n + a 2 )) 2 - (n + a 2 ) 2 + /U 2 ) the last integral also becomes: EPO = {(J, + cr 2 ) 2 - /j 2 e 2(T7r a +oo 1 fe du (. H + cr 2 ) 2 - /i 2 a* = e 2o7t = e M+ T (7.494) (7.495) (7.496) and where we used the property that emerged during our study of the Normal distribution, that is to say that any integral of the form: +oo (x — c te 2 2 dx = crV2n — OO (7.497) always has the same value! To calculate the variance, recall that for a random variable X, we have the Huygens theorem: \{X) = E(X 2 ) - E(X) 2 (7.498) info @ sciences. ch 429/5785 4. Arithmetic EAME v3. 5-2013 Let us calculate E(A" 2 ) by performing similarly to previous developments: +oo -2\ / 2 . +oo (ln(x) — /i) 2 E(A ) = / x f(x)dx = ^ . + 00 (m — yu) 2 xe 2<r 2 dx +oo (m — a) 2 n 1 f 7? ! -+2 u e u e 2<j 2 e“du = — , — / e 2<j 2 dw g\/2t: J o\[ 7 tK j — OO — OO +oo {u — (n + 2cr 2 )) 2 (/I + 2<J 2 ) 2 — /i 2 ■-^= / e CTa/27T J — OO (yu + 2a 2 ) - /i 2 e 2^ 2cr 2 2er 2 dw 2\\2 +00 (m — (yU + 2cr 2 )) / e 2a 2 <TV^27T j — OO (/i + 2cr 2 ) — yU 2 4/rcr 2 + 4a 4 dw = e 2cr 2 = e 2cr 2 = e 2 ^+ 2 <* 2 = e 2 ^+° 2 ) (7.499) where once again we have the change of variable: u = ln(a;) s = e“ => dr = e u du and where we transformed the expression: 0 - v) 2 2a 2 + 2 u as: Then: 2cr 2 ((w - (n + 2cr 2 )) 2 - (yU + 2a 2 ) 2 + yU 2 ) V(A) = E(X 2 ) - E(X) 2 = e 2fJ,+2a2 - ' ^ a e ~2 2 \ 2 — g 2 /. t + 2 cr 2 _ g 2 / i + cr 2 _ g 2 / x + cr 2 ^ cr 2 _ ^ (7.500) (7.501) (7.502) (7.503) Here is a plot example of the distribution and cumulative distribution of the Log-Normal func- tion of parameters (/i, a) = (0, 1): 430/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Figure 4.93 - Log-Normal law (mass and cumulative distribution function) 7.6.11 Continuous Uniform Distribution Let us choose a < b. We define the continuous uniform distribution function or "uniform function" by the relation: U a ,b(x) = b — a (7.504) where l^^j means that outside the domain of definition [a, b] the distribution function is zero. We will find this type of notation later in some other distribution functions. So we have for the cumulative distribution function: X X P(X < x)= l[ a ,&]dx = l[a,6] dx = J b — a b — a J x [a,b\ - 1 b — a (7.505) It is indeed a distribution function because it satisfies (simple integral): +oo -foo Pa.hdx = b — a l[a,t]dx = b — a [a, b] dx = T^—ha..K\x\ b „ = 1 fn.hl ~ — - = 1 (7.506) b — a b — a The continuous uniform function has for expected mean: // = E(A") = / xf(x)dx = 1 b — a a 1 (b + a) (6 — a) a + b dx = 1 x 2 b — a 2 1 b 2 — a 2 b — a 2 (7.507) b — a info @ sciences. ch 431/5785 4. Arithmetic EAME v3. 5-2013 and for the variance using the Huygens theorem: V{X) = E(X I 2 ) - E(X) 2 = / x 2 f(x)dx - a + b b — a x 2 dx — a + b 1 x 3 b — a 3 a + b 1 b 3 — a 3 b — a 3 cl T b 1 (b — a) {b 2 + ab + a 2 / a + b ~ b-a 3 V 2 1 2 2 \ l/,2 , 4(6 2 + ab + a 2 ) — 3(b 2 + 2ab + a 2 ) = -(b 2 + ab + a 2 ) - -(b 2 + 2ab + a 2 ) = v ; v ; 3 ' 4 V y 12 46 2 + 4a6 + 4a 2 — 3 b 2 — 6 ab — 3a 2 ) b 2 — 2 ab — a 2 (b — a) 2 12 “ 12 “ 12 (7.508) Here is a plot example of the distribution and cumulative distribution of the continuous uniform function of parameters (a, b ) = (0, 1): Figure 4.94 - Uniform continuous law (mass and cumulative distribution function) Remark This function is often used in business simulation to indicate that the random variable has equal probabilities to have a value within a certain interval (typically in portfolio returns or in the estimation of project durations). The best example of application is again CrystalBall or @Risk software that integrate with Microsoft Project. V / Let us see an interesting result of the continuous uniform distribution (and that applies also to the discrete one as well...). I often hear managers (who consider themselves at high level) that if we have a measure with an equal probability to occur in a closed given interval, then the sum of two such independent random variables have also the same equal probability in the same interval! 432/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Now we will prove here that this is not the case (if someone has a more elegant proof I’m interested) ! Proof 4.33.3. Consider two independent random variables X and Y that follow a uniform dis- tribution in a closed interval [0, a]. We are searching the density of their sum will be written: Z = X + Y (7.509) Then we have: fx (x) = f Y (y) +1 if 0 < x, ya 0 otherwise with the variable: 0 < z < 2a (7.510) (7.511) (7.512) To calculate the distribution of the sum, remember that we know that in discrete terms this is equivalent to the joint product of probabilities (see section Probabilities) of the occurrence of two continuous variables (remember the same ki nd of calculation in the discreet form!). That is to say: +oo fz(z) = I fx(z-y)f Y {y)dy (7.513) — OO As f Y (y ) = 1 if 0 < y < a and 0 otherwise then the product of the previous convolution reduces to: a fz(z) = Jfx{z-y)dy (7.514) o The integrand is by definition 0 except by construction in the interval 0 < z — y < ait is then 1 . Let us focus on the limits of the integral that is in this case the only one that is interesting .... First we make a change of variables by writing: u = z — y (7.515) thus: d u = — d y (7.516) The integral can be then written in this interval after the change of variable: a z—a z—a fz{z) = J fx(z - y)dy = - j fx(u)du = j du (7.517) 0 2 2 info @ sciences. ch 433/5785 4. Arithmetic EAME v3. 5-2013 Remembering that we have seen at the beginning that 0 < z < 2a, then we have immediately if z < 0 and z > 2a that the integral is zero. We will consider two cases for the interval because the convolution of these two rectangular functions can be distinguished according to the situation where at first they cross (nest), that is to say where 0 < z < a, and then recede from each other, that is to say a < z <2a. • In the first case (nest) where 0 < z < a: fz(u) dw d u = u\q = z (7.518) z—a 0 where we changed the lower bound to 0 because anyway fx(u) is zero for any negative value (and when 0 < z < a,z — a is precisely zero or negative!). • In the second case (dislocation) where a < z <2a: fz(u ) d u — / du = a — (z — a) = 2a (7.519) where we changed the upper terminal a because anyway fx(u) is zero for any higher value (and when a < z < 2a, z is just larger than a). So in the end, we have: I z if 0 < z < a fz(z ) = < 2a — z if a < z < 2a (7.520) 1 0 otherwise (7.521) □ Q.E.D. This is a particular case, deliberately simplified, of the triangular distribution that we will dis- cover just after... This result (which may seems perhaps not intuitive) can be check in a few seconds with a spread- sheet software like Microsoft Excel 1 1.8346 using the RANDBETWEEN () and the FREQUENCY ( ) functions. 434/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 7.6.12 Triangular Distribution Let a < c < b . We define the "triangular distribution" (or "triangular function") by construction based on the following two distribution functions: r Pa,c { X ) A x - a) (6 — a)(c — a) ^ and P Ci6 (x) 2(b-x) (b — a){b — c) ^ (7.522) where a is often assimilated with the optimistic value, c to the modal value and b the pessimistic value. It is also the only way to write this distribution function if the reader keeps in mind that the base of a triangle of lenght c — a must have a height h equal to 2/ (c — a) as its total area is equal to unity (we will soon prove it). Here is a plot example of the triangular distribution and cumulative distribution for the param- eters (a, c, b) = (0, 3, 5): 045-r— 040- 030- 030- 020 - 020 - 010 - 010 - 006- - 000 - lO- 06- - i Figure 4.95 - Triangular law (mass and cut The slope of the first straight line (increasing from left) is obviously: 2 a ( b — a) (c — a) and the slope of the second straight line (decreasing to the right): -2 ( b — a) (b — c ) This function is a distribution function if it satisfies: +oo P = ( ( Pa,c ( X ) + Pc,b( x )) dx = * 1 (7.523) (7.524) (7.525) info @ sciences. ch 435/5785 4. Arithmetic EAME v3. 5-2013 It is in this case, simply the area of the triangle which we recall is simply the base multiplied by the height divided by 2 (see section Geometric Shapes): (7.526) Remark This function is widely used in project management in the context of task duration esti- mations or in industrial simulations. Where a corresponds to the optimistic value, c to the expected value (mode) and the value b to the pessimistic value. The best example of application is again the softwares CrystalBall or @Risk that are add-ins for Microsoft Project. V / The triangular function has also for mean (average): +oo H — J xf(x)dx = — oo a 2 f 2 (x — a) . f 2(6 — x) . X— — rdx + / X— — rdx (6 — a) (c — a) (i b — a) (6 — c) ( b — a) (c — a) 2 2 (1 9 1 + | -bx 2 — —x 3 ) “ (b — a)(c — a) 3 1 ( b — a)(c — a) 2 (YV - V (b — a)(c — a) V V 2 3 -x ax 3 2 1 3 1 2 \ As 1 3 -c ac — -a a 3 2 J \3 2 i, 2 1 3 -be c 2 3 be 3 + ac 3 1 , 1 , ,, ba 3 ab 3 , -a A c + -ccr -| be 3 3 3 3 -ac 1 _ 3 1 3* (6 — a)(c — a) (b — c ) -ba 3 - -~ab 3 + -c 3 3 3 3 — -be 3 — -a 3 c + -ba 3 — - -ab 3 + -c6 3 + -ac 3 (i b — a) (c — a) (6 — c) 1 —be 3 — a 3 c + ba 3 — ab 3 + cb 3 + ac 3 1 (a + b + c) (b — a) (c 3 a)(6 (7.527) (6 — a)(c — a) (6 — c) (6 — a)(c — a) (6 — c) a T b T c 436/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic and for variance: +oo c b V 2 = [ 0 - fi) 2 f(x)dx = / (x- /i) 2 2 ^ X ^ dx+ f(x-n) 2 2 ^w, X ^ x dx ./ ./ (o — a)(c — a) ,/ (b — a) (b — c) — oo a c 1 c (—3c 3 + 8c 3 n + 4c 2 a — 6 c/j 2 — 12 c/za + 12 /i 2 a) 1 a 2 (a 2 — 4/ra + 6/i 2 6 (6 — a)(c — a) ~A (6 — a)(c — a) 1 —3c 3 + 8c 2 /i + 4c 2 6 — 6 c/i 2 — 12c/i6 + 12/i 2 5 1 6 2 (6 2 — 4^6 + 6/r 2 ) 6 (6 — a)(c — a) 6 (6 — a)(c — a) i.2 !, 2 , 1 2 2 2 1 2 1 2 = + c ha - 3^0 + “ 3^° + / i + a ca - o C V + a C oo3o3 o3o (7.528) We can replace /i by the result obtained before and we get after simplification (it is boring algebra...): a 2 = a 2 + b 2 + c 2 - ab- ac- be (? ^ We can show that the sum of two independent random variables, each uniformly distributed on [a, b] (i.e. independent and identically distributed) follows a triangular distribution on [2a, 2b] but if they do not have the same limits, then their sum gives something that has no name to my knowledge... 7.6.13 Pareto Distribution The "Pareto distribution" (or "Pareto law"), also named "power law" or "scale law" is the for- malization of the 80 — 20 principle. This decision tool helps determine the critical factors (about 20%) influencing the majority (80%) of the goal. Remark This distribution is a fundamental and basic tool in quality management (see Industrial Engineering and Quantitative Management sections). It is also used in reinsurance. The theory of queues had also some interest in this distribution when some research in the 1990s showed that this distribution also seems ton explain well a number of variables observed in the Internet traffic (and more generally on all high speed data networks). V I / A random variable is said by definition follow a Pareto distribution if its cumulative distribution function is given by: P(X ^ x) = 1 - (— ) (7.530) V x / with x that must be greater than or equal to x m . The Pareto density function (distribution function) is then given by: f(x) = A (^) k = _ T fc A A dx ^ x ' m dxx k k k Xm x k -\- 1 (7.531) info @ sciences. ch 437/5785 4. Arithmetic EAME v3. 5-2013 with k € M+ and x > x rn > 0 (then x > 0). The Pareto distribution is defined by two param- eters, x m and k (Named "Pareto index"). This distribution is also said to be "scale invariant" or "fractal distribution", because of the following property: f (c te ■ x ) = kx k m ( c te ■ x )' fc_1 = (c te )' fc_1 kx k m x~ k - 1 = (c te ) _fc_1 f(x) oc f(x) (7.532) The Pareto function is also well a distribution function as the cumulative distribution function known we have: -Too fix) dx =(l- (^) ) 0 = ( 1 +oo (7.533) f Xm V ' +00 ' The expected mean is given by: +oo +oo X 722 ) ) = (1 - 0) - (1 - l fc ) = 1 +oo I 1 = E(X) = / xf(x) dx = I xk^^x = kx k m / x^x = hi rpk 1 rXjdj m x X X * k — l x fc_1 -Too kx r . k — 1 (7.534) iffc > 1. If k < 1, the mean does not exist. To calculate the variance, using the Huygens theorem: V(X) = E(X 2 ) - E(AT) 2 (7.535) we get: +oo +oo E(A" 2 ) = / x 2 /(x)x = kx kx T X' k - 1 — 2 x k ~ 2 -Too k-r 2 k- 2 (7.536) hr 2 ’ hJy m if k > 2. If k < 2, E(A" 2 ) doesn’t exists. So if k > 2: ,2 ( kx m \ 2 ° = V « = ~2 ~ {—i) = ~ k _ i )2(fc _ 2) < 7 - 537 > If k < 2, the variance doesn’t exists. Here is a plot example of the Pareto distribution and cumulative distribution for the parameters (x, x rn , k ) = (x, 1, 2): 438/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic See that when k — * +00 the distribution approaches b(x — x m ) where 5 is the Dirac delta function. There is another important way to deduce the family of Pareto distributions that allows us to understand many things about other distributions and that is often presented as follows: Let us write x 0 the threshold beyond which we calculate the mean of the considered quantity, and E(y) the mean beyond this threshold x 0 as it is proportional (linearly dependent) to the chosen threshold: E(y) = ax 0 + b (7.538) This functional relation expresses the idea that the conditional mean beyond the threshold x 0 is a multiple of this threshold plus a constant, that is to say a linear function of the threshold. Thus, in project management, for example, we could say that once a certain threshold of time is exceeded, the expected duration is a multiple of this threshold plus a constant. If a linear relation of this type exists and is satisfied, then we talk about a probability distribution in the form of a generalized Pareto distribution. Consider the mean of the Bayesian conditional function given by (see section Probabilities): +OO (7.539) xo If we write F(y) the cumulative distribution function f(y), then we have by definition: dF(y) = f{y)dy (7.540) Thus: (7.541) and if we define: F x (x) = P(X >x 0 ) = l- F x (x) (7.542) what we can assimilate to the "tail of the distribution". We get: (7.543) x 0 info @ sciences. ch 439/5785 4. Arithmetic EAME v3. 5-2013 and therefore we seek the very special case where: +oo E(Y) = F(x o) ydF(y) = ax + b (7.544) XQ this is to say: +00 ydF(y) = (ax + b)F(x 0 ) (7.545) xo Differentiating with respect to x, we find: +oo — ( / ydF(y) ] = — ((ax + b)F(x 0 )) dx (7.546) S.X 0 The derivative of the integral defined above will be the derivative of a constant (valorisation of the integral in +oo) minus the derivative of the analytical expression of the integral for x 0 . So we have: d dx +oo ydF(y) = d dx +oo yf(y)dy I = ~xf(x) = -x di ^ = aF(x) + (ax + b)^ F<yX dx \%0 Thus: and as: it comes: -x- d F(x) dx = aF(x) + (ax + b) d F(x) dx d F = d(l — F) — -d F d F - ,xdF x—— = ab (x) + (ax + b)—— dx dx After simplification and rearrangement we obtain: aF(x) dx = — (x(a — 1) + b) d F(x) dx (7.547) (7.548) (7.549) (7.550) (7.551) which is a differential equation in F(x). Its Resolution provides all forms of seek Pareto distri- butions, according to the values taken by the parameters a and b. To solve this differential equation, consider the special case where a > 1, b = 0. Then we have: aF(x) dx = — x(a — l)dF(x) (7.552) By writing: k = a — 1 (7.553) 440/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic We then get: and therefore: It comes: — dx = — - 7 , d Fix) x kF(x) v ' ln(x) = — In (F(x)) + c te k , ( l _\ 1 1 in l I — \n(x)+c te —In (^F(x)^j (7.554) (7.555) W = e k = ek (7.556) and therefore: In ( — ) = F(x)kd te (7.557) We have: r,te \ k H x )= - = — X X r X Then it comes form the cumulative distribution function: F(x) = 1 -F(x) = 1- ( — \ x If we seek for the distribution function, we derive by x to get: dx x fe+i (7.558) (7.559) (7.560) This is the Pareto distribution we have used since the beginning and called "Pareto distribution of type I" (we won’t see in this book those of type II). An interesting thing to observes is the case of the resolution of the following differential equa- tion: aF(x)dx = —(x(a — 1) + b)dF(x) when a = 1, b > 0. The differential equation is then reduced to: F(x)dx = —bdF(x) Thus: -dx = . d F(x) b F(x) V ' (7.561) (7.562) (7.563) After integration: — -X — ln(F(x)) (7.564) info @ sciences. ch 441/5785 4. Arithmetic EAME v3. 5-2013 and therefore: 1 F(x) = e~b X (7.565) If we make a small change in notation: F(x) = e~ Xx (7.566) and that we write the distribution function: F(x) = 1 - F(x) = 1 - e~ Xx (7.567) and by derivating we get the distribution function of the exponential distribution: F(x) = \e~ Xx (7.568) So the exponential distribution has a conditional mean threshold that is equal to: E(y) = ax 0 + b = x 0 + b = x 0 + - = x 0 + a (7.569) «=i 1 b b =\ So the conditional mean threshold is equal to itself plus the standard deviation of the distribu- tion. 442/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 7.6.14 Exponential Distribution We define the "exponential distribution" (or "exponential law") by the following distribution function: P(x) = Xe A:r l[o, + oc] (7.570) with A > 0 that as we will immediately see is in the fact that the inverse of the mean and where a: is a random variable without memory. This law is also sometimes denoted £(X). In fact the exponential distribution naturally appears from simple developments (see the Nu- clear Physics chapter for example) under assumptions that impose a Constance in the aging of phenomenon. In the section of Quantitative Management, we have also proved in detail in the section on the theory of queues, that this law was without memory. That is to say, that the cumulative probability of a phenomenon occurs between the time t and t + s, if it is not realized before, is the same as that the cumulative probability of occurring between the time 0 and s. Remarks Rl. This function is occuring frequently in nuclear physics (see chapter of the same name) or quantum physics (see also chapter of the same name) as well in reliability (see Industrial Engineering) or in the theory of queues (see section Quantitative Management). R2. We can get this distribution in Microsoft Excel 11.8346 with the EXP0NDIST( ) function. It is also really a distribution function because it verifies: +00 -|-oo J P\(x)dx = A J e~ Xx dx = A e~ Xx + = — (e —ocx Ox ) = -(0 - 1 ) = 1 (7.571) The exponential distribution has for expected mean using integration by parts: +oo +oo +oo +oo /i = / xP\(x)dx = A / xe Xx dx = — xe — Xx l+OO - I —e~ Xx dx = / e~ Ax dx —\x. — oo - p-Ax A (7.572) -Too 1 o “ A and for variance using once again the Huygens relation: Y(X) = E(X 2 ) - E(X) 2 it remains for us to only the to calculate: (7.573) -Too E(X 2 )= / \x 2 e~ Xx x (7.574) info @ sciences. ch 443/5785 4. Arithmetic EAME v3. 5-2013 A variable change y — A leads us to: +OO E(X 2 ) = 1 J y 2 e- y dy o (7.575) A double integration by parts gives us: b b f f(t)g'(t)dt = f{t)g{t)\ b a - J f'(t)g(t)dt a a +oo +oo +oo y 2 e y d y = —y 2 e + 2 ye - y d y = 2 [~ye~ y |J°°] + 2 / e~ y d y = 2 (7.576) Hence: E(X 2 ) = (7.577) we have therefore: V(A') = E(A' 2 ) - E(A) 2 = 7 - (i)' = 1 (7.578) So the standard deviation (square root of the variance for recall) and mean have exactly the same expression! Here is a plot example of the exponential distribution and cumulative distribution for the pa- rameter A = 1: Figure 4.97 - Exponential law (mass and cumulative distribution function) Now let us determine the distribution function of the exponential law: P{X ^ x) \e~ xt dt = A At d t i - e - A * (7.579) 444/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Remark We will see later that the exponential distribution is a special case of a more general distribution which is the chi-square distribution, the chi-square is also a special case of a more general distribution that is the Gamma distribution. This is a very important property used in the "Poisson test" for rare events (see also below). V / 7.6.15 Cauchy Distribution Let A", Y be two independent random variables following a Normal reduced centered distribu- tion (with zero mean and unit variance). Thus the density function is given for each variable by: fx(x) = f Y {x) = (7.580) V 2 7T The random variable: T =^\ (7-581) (the absolute value will be useful in an integral during a change of variable) follows a charac- teristic appearance named "Cauchy distribution" (or "Cauchy law") or even "Lorentz law". Let us now determine its density function /. To do this, recall that / is determined by the (general) relation: t Vf g M, P{T < t) = J f(x)dx (7.582) — OO So (application of elementary differential calculus): /(f) = ^P(T < f) (7.583) in the case where / is continuous. Since X and Y are independent, the density function of the random vector is given by one of the axioms of probabilities (see section Probabilities): {x, y) ^ fx(x) ■ fy(x) (7.584) therefore: P(T <t) = p(Jtf< tj= P{X < t\Y \ ) = j f x (x) ■ f Y (x)dxdy (7.585) D where D = {(x,y)\x < t\y\}. This last integral becomes: +oo t-\y\ J fx(x) ■ f Y (x)dxdy = J J f x (x) ■ f Y {x)dxdy (7.586) D — oo — oo info @ sciences. ch 445/5785 4. Arithmetic EAME v3. 5-2013 Let us make the following change of variables x = u\y\ in the inner integral. We obtain: +OO t t +oo P(T^t)= / / f x (u ■ \y\) ■ f Y (y)\y\dudy = fx(u ■ \y\) ■ f Y {y)\y\dydu (7.587) — OO — CXD — oo — OO Therefore: +oo +oo f(t) = tP( t ^ t) = / fx(t ■ \y\) ■ f Y (y)\y\dy = — / e 2 \y\dy (7.588) df 2 tt Now the absolute value will be useful to write: +oo +oo f(t) = — / e " 2 \y\dy = ~— / e y 2 y dy + — / e 2 vr 2 tt 2 tt -ydy (7.589) For the first integral we have: 1 2tt -L 1 _ V c ^ x ) , — e 2 ydy = y 2 (t 2 + 1) e 2 f 2 + 1 -inf 1 + 777 — — 7 - = 0 (7.590) f 2 + 1 f 2 + 1 It remains therefore only the second integral and making the change of variable v = y 2 , we get: +OO m = 2 vr e 2 dv = — ^(t 2 +i) e 4 2 J| vr(f 2 + 1) +00 vr(f 2 + 1 ) (7.591) What we will denote thereafter (to respect the notations adopted so far): P(X) = 2 7r(x^ + 1 ) (7.592) and that is just simply the so called Cauchy distribution. It is also a effectively a distribution function because it verifies (see section Differential and Integral Calculus): +OO +OO P(x)dx = — — — -dx = — (arctan(-foo) — arctan(— 00 )) = — [ — — [ — — ) ) = 1 7T / X | X 7T 1 / 7T n V2 (7.593) It is obvious that we get therefore for the cumulative distribution function: X X P(X < x) = J P(x)dx = \ j dx = — (arctan(x) — arctan(— 00 )) = X 2 + 1 7T (7.594) = 1 (arctan(x) -(-£)) = -arctan(x) + 7 r V \ 2 / / 7 r Here is plot example of the Cauchy distribution: 446/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic The Cauchy distribution has for expected mean: x 1 :dx = — 7 r 1 + x 2 1 + X 2 1 + X 2 ix = j xP{x) dx = — j — OO — OO = J- ( ln (! + a;2 )|° oo + M 1 + ^C) - J-(~ ln (°°) + ln(oo)) rdx (7.595) = 0 Caution!!!! The above calculations do not give zero in facts because the subtraction of infinite is not zero but indeterminate! The Cauchy distribution therefore and strictly speaking does not admits an expected mean! Thus, even if we can build a variance: +oo +oo +oo a 2 = I (x — fi) 2 f(x)dx = / x 2 P(x)dx = 7 T J 1 + X 2 — OO x 2 1 dx = — 7 r +oo 1 _ TjT~2 1 dX 1 + x z = — lim 7f t— >-+oo 1 - 1 + X 1 dx = 27 t lim (t — arctan(f)) = +cx) t — (7.596) this is absurd and does not exist strictly speaking as the mean doesn’t exists...! The Cauchy distribution is used a lot in financial engineering as it is heavy tailed and therefore a very good candidate to be more accurate in predicting extreme values at the opposite to the Normal distribution that has the tails decreasing to quick. Further the Cauchy distribution is a heavy tailed law with a support on M when the Pareto distribution (also heavy tailed) is defined only on M + . info @ sciences. ch 447/5785 4. Arithmetic EAME v3. 5-2013 The Cauchy distribution if one of the most famous distribution function that... we cannot found in the spreadhsheet softwares like Microsoft Excel. To be able to get the closed form of the inverse Cauchy CDF we start from the CDF proven previously: P(X < x) — — arctan(x) + - 7T 2 (7.597) and therefore if we let: 1 / N 1 y = — arctan(x) + - 7 T 2 (7.598) We immediately get the inverse CDF: x = tan ( 7T ( y - - (7.599) That is useful in finance as we know (see section Theoretical Computing) that to simulate a Cauchy variable when the use the inverse transforme sampling: X tan [ tt [ U <[ 0 , 1 ] (7.600) 7.6.16 Beta Distribution Let us first recall that the Euler Gamma function is defined by the relation (see section Differ- ential And Integral Calculus): +oo " / e~ x x z ~ 1 dx (7.601, 0 We proved (see section Differential And Integral Calculus) that a non-trivial property of this function is: T(z + 1) = zT{z) (7.602) Let us now write: where: r (a)T(6) R \im jj e-^x^yh-'dxdy Ar (7.603) Ar = {(x,y)\x >0,y>0,x + y<R} (7.604) By the change of variables: (7.605) 448/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic we get: R T(a)T(b) = lim ff e-^x^y^dxdy = lim f e~ u R — ^-|-oo JJ R — ^-|-oo J a r o \ 0 (u - v) a 1 v b I du (7.606) For the internal integral we now use the substitution v — ut, 0 < t < 1 and therefore we find: R r (a)r(6) = lim e~ u R — ^-)-oo / (u — v) a 1 v b 1 dw dw o o R r >. +“ (7.607) = Jim / e~ u u a+b ~ l du l{l-t) a -H b - 1 dt = B{a,b) / e-V+^Mu o o 0 = B(a, b)T(a + 6) The function /i that appears in the expression above is named "beta function" and therefore we have: (7.608) Now that we have defined what we name the beta function, consider the two parameters a > 0, b > 0 and consider also the special relation below as the "beta distribution" or "beta law" (there are several formulations of the beta distribution and a very important one is studied in detail in the section of Quantitative Management): where: r Pa,b ( X ) s. _ x a -\l-x) b -\ B(a,b) ]0,1[ B(a, b) - i 0 y (7.609) (7.610) We first check that P a ,b(x) that is effectively a distribution function (without getting into too much details ...) +oo +oo +oo . f x a_1 (l — a ;) 6 ” 1 f x a_ 1 (l— x ) 6_1 Pa,b{x)x = I ( n l]o,i[dx = / — rr l]o,i[dx £>(a, b) +oo B(a, b) (7.611) / 7 / x^l - xf-'dx = _ 1 B(a,b) J B(a,b ) Let us now calculate the expected mean: +oo /i = / xP a)b (x)dx = B(a, b) x a {l-x) b ~ 1 dx (7.612) B(a + l,b) r(a + l)r(6) T(a + 6) a B(a,b ) r(a + 6 + l) r(a)T(6) a + b info @ sciences. ch 449/5785 4. Arithmetic EAME v3. 5-2013 by using the relation: r(* + l) = zT{z) (7.613) and its variance: +oo (J 2 = (x- [l ) 2 f(x)x = (x — fi) 2 x a \l—x) b x dx B(a,b) 1 x a+1 (l - x) b ~ l x - 2 fi I x a {l-x) b ~ 1 dx + ^ 2 I x^yi-xf^dx \b- 1, B(a,b) 0 0 0 B(a + 2, b) — 2 fi 2 B(a, b ) + fi 2 B(a + 1, b) B(a + 2 ,b) — fi 2 B(a , b) B(djb) B(a,b ) (7.614) As we know that: E(* + 1) = *r(z) and B(a, b) = J (7.615) we find: u ■ (a + 1) B{a + 2,b) = h \ ’ v ' a + 5 + 1 (7.616) and therefore: 2 /r • (a + 1) 2 ab a + 5 + 1 ^ (a + &) 2 (a + b + 1) (7.617) Examples of plots of the beta distribution function for (a, b) = (0.1, 0.5) in red, (a, b) = (0.3, 0.5) in green, (a, b) = (0.5, 0.5) in black, (a, b) = (0.8, 0.8) in blue, (a, b) = (1, 1) in magenta, (a, b) = (1,1.5) in cyan, (a, 6) = (1,2) in gray, (a, b) = (1.5,2) in turquoise, (a, b) = (2, 2) in yellow, (a, b) = (3, 3) in gold color: 450/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Here is a plot example of the beta distribution and cumulative distribution for the parameters M) = (2,3): o o o o o o — — o © o o o o Figure 4.100 - Beta law (mass and cumulative distribution function) 7.6.17 Gamma Distribution The Euler Gamma function being known, consider two parameters a > 0, A > 0 and let us define the "Gamma distribution" (or "Gamma law") as given by the relation (density function): a-l -Xx Pa,x(x) = ^ ^ 1 +oo ]0,+oo[ x a ~'e~ Xx dx (7.618) By the change of variables t = Xx we obtain: +oo 0 r(a) (7.619) and we can then write the relation in a more conventional form that we find frequently in the literature: (7.620) and it is under this notation that we find this distribution function in Microsoft Excel 11.8346 under the name GAMMAD 1ST ( ) and its inverse by GAMMAINV ( ). Let us now see a simple property of the Gamma distribution that will be partially useful for the study of the Welch statistical test . First recall that we have shown above that: info @ sciences. ch 451/5785 4. Arithmetic EAME v3. 5-2013 Vj/eR Ms) = ^(|) (7 - 621) Let us write Y = c te X, then we have immediately: (?) W. (7 - 622) So the multiplication by a constant of random variable that follows a Gamma distribution has only for effect of dividing the parameter A by the same constant. This is the reason why A is named "NewTermscale parameter". If a E N, the Gamma distribution at the denominator becomes (see section Differential And Integral Calculus) the factorial (a — 1)!. The Gamma function can then be written: P a , a — x a 1 X a e Xx (a - 1)! (?A) a 1 (a - 1)! Xe~ Xx (7.623) This particular notation of the Gamma distribution is named the "Erlang distribution" that we find naturally in the theory of queues and that is very important in practice! Then we check with a similar reasoning to this of the beta distribution that P 0) \(x) is a distribu- tion function: +oo J P a ,\ dx = 1 — OO (7.624) Examples of plots of the beta distribution function for (a, A) = (0.5, 1) in red, (a, A) = (1, 1) in green, (a, A) = (2, 1) in black, (a, A) = (4, 2) in blue, (a, A) = (16, 8) in magenta: 452/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic and a plot example of the Gamma distribution and cumulative distribution for the parameters (a, 7 ) = (4,1): Figure 4.102 - Gamma law (mass and cumulative distribution function) The Gamma function has also for expected mean: +OO xf(x)dx — OO A a f(a) +OO I ^ — OO A a T(a + 1) A a aT(a) r(a) A“+ x T(a) A“+! a A (7.625) info @ sciences. ch 453/5785 4. Arithmetic EAME v3. 5-2013 and for variance: +oo cr 2 = (x — y) 2 f(x)dx = x a W) +oo (x - iY) 2 x a ~ 1 e~ Xx dx —oo +oo +oo +oo +oo x a+i e -\x x _ 2 n [ x a e~ Xx x + jJ 2 [ X a fT(a + 2) + 2 ^( a ) g -^( a + l) x a 1 e Xx dx r(a A a + 2 A a A a+1 (7.626) . , — r (r(o + 2) + a 2 r(a) — 2aT(a + 1)) A 1 (ci) ' ((a + l)ar(a) + a 2 r(a) - 2a 2 r(a)) = ° A 2 r(a) A 2 Let us now prove a property of the Gamma distribution that will permit us later in this chapter, during our study of the analysis of variance and confidence intervals based on small samples, another extremely important property of the Chi-square distribution. As we know, the distribution function of a random variable following a Gamma function of parameters a, X > 0 is: Pa,j(x) = f(x) = p \x ryt CL — 1 E (a) A a l [0,+oc] (7.627) with (see section Differential And Integral Calculus) the Euler Gamma function: +oo r(a) = [ e^x^dx (7.628) Moreover, when a random variable follows a Gamma function we often notice it in the following way: X + Y = 7 (a, A) (7.629) Let X, Y be two independent variables. We will prove that if X = 7 (p, A) and Y = gamma(q, A), hence with the same scale parameter, then: X + Y = 7O + q, A) (7.630) We write / the density function of the pair X, Y, f(x) the density function of X and f Y the density function of Y. Because X and Y are independent, we have: f(x,y) = f x (x) ■ f Y (y ) (7.631) for all x, y > 0. Let Z — X + Y . The distribution function of Z is therefore: F(z) = P{Z ^ z) = P(X + Y ^ z) = II f(x,y)xy (7.632) D 454/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic where D = {{x,y)\x + y < z}. Remark As we already know we name such a calculation a "convolution" and statisticians often have to handle such entities because they work on many random variables that they have to sum or even to multiply. \ / Simplifying: +oo z—x F{z) = fx(x)f Y (y)dxdy (7.633) — OO — OO We perform the following change of variable x = x,y = s — x. The Jacobian is therefore (see Differential And Integral Calculus): J = dx dx dx ds dy dx ds 1 0 -1 0 = 1 Therefore with the new integration limits s = x + y = x + (z — x) = z we have: +00 2 Z +oo F{z) = / / fx(x)f Y (s - x)dsdx = fx(x)f Y (s - x)dxds — OO — OO — oo — OO If we denote by g the density function Z we have: z +oo F{z)= / / fx(x)f Y (s - x)dxds = / g(s)ds — oo — oo Then it follows: +oo (7.634) (7.635) (7.636) (7.637) g(s) = j fx(x)f Y (s-x) dx — OO fx and f Y being null when the argument is negative, we can change the limits of integration: S g(s) = [ fx(x)f Y (s - x) dx for s^O (7.638) Let us calculate g: 9(s) = \ p+q e —As r (p)T(q) x p x (s — x) q 1 da; (7.639) info @ sciences. ch 455/5785 4. Arithmetic EAME v3. 5-2013 After the change of variable x = st we obtain: 9(s) A p + i e ~ Xs rWW) s P+q~l A p + i e ~ Xs W s p+q - ] B{p,q) (7.640) where B is the beta function we saw earlier in our study of the beta distribution. But we have also proved the relation: Therefore: B(p , q ) I»r(g) r(p + g) B(p, q ) \P+q e ~ Xs T(p + q) s P+q~l (7.641) (7.642) More explicitly: \p+q p-Kx+y) 9 (x,y) = ( x + y r 9 - 1 (7.643) r(p + g) Which finally gives us: S S / r \p+q P ~ Xs g ( s )ds = / — is p+q - ] ds (7.644) J r (p + q) o o This shows that that if two random variables follow a Gamma distribution then their sum will also follow a Gamma distribution with parameters: X + Y = 7 (p + q, A) (7.645) So the Gamma distribution is stable by addition as are all distribution arising from the Gamma distribution that we will see below. 7.6.18 Generalized Gamma Distribution The generalized gamma distribution is a continuous probability distribution with three param- eters. It is a generalization of the two-parameter gamma distribution. Since many distributions commonly used for parametric models in survival analysis (such as the Exponential distribu- tion, the Weibull distribution and the Gamma distribution, and lognormal) are special cases of the generalized gamma, it is sometimes used to determine which parametric model is appropri- ate for a given set of data. Therefore let us notice that if we write after trials and errors the following density function named "generalized Gamma law": f(x) fxy aT(«;) (7.646) 456/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic with x > 0, a > 0, r] > 0, k > 0. Then, for k = 1 we fall back on the density function of Weibull law (see section Industrial Engineering) that is with our own notations of the corresponding section is given by: x fix) = ax al pa _ ^_ x OL-\ e \/3 a /3 j3 a ~ l f3 c For r] = 1, we fall back the Gamma density function just introduced before: (7.647) fix) = — 1 g — \x r(«) \ a • l ]0,+oo[ (7.648) For k — 1 and // = 1, we fall back on the exponential distribution also seen previously: f{x) = \e Xx • l [0 ,+oo[ (7.649) and finally for rj — >■ 0, k — >■ +oo we fall back on a log-normal distribution after developing the limits using the Stirling, Hospital and Taylor techniques (see section Functional Analysis, Sequences and Series, Differential and Integral Calculus): fix) = — 7 H= e 2cj2 (7-650) ox\J h r As always, on request we can detail the developments! 7.6.19 Chi-Square (Pearson) Distribution The "chi-square distribution" (also called "chi-square law" or "Pearson law") has a very impor- tant place in the industrial practice for some common hypothesis tests (see far below...) and is by definition only a particular case of the Gamma distribution in the case where a = k/2 and A = 1/2, when k is a positive integer: Pk(x) 2iT{k/2) -x E __1 _ t ■ 1 [0,+oo[ (7.651) This relation that connects the chi-square distribution with the Gamma distribution is important in the in Microsoft Excel 1 1 .8346 as the function CHIDIST ( ) returns the confidence level and not the distribution function. Then you must use the function GAMMADIST () with the parameters given above (except that you must take the inverse of 1/2: also 2 as parameter) to get the distribution and cumulative functions. The reader who wishes to check that the Chi-square distribution is only a special case of the Gamma distribution can write in Microsoft Excel 14.0.6123: =CHISQ.DIST(2*x,2*k,TRUE) =GAMMA .DIST (x,k, 1 ,TRUE) info @ sciences. ch 457/5785 4. Arithmetic EAME v3. 5-2013 All calculations made previously still apply and we get immediately: fi = k, a 2 = 2k (7.652) Examples of plots of the chi-2 distribution function for k — 1 in red, k — 3 in green, in black, k = 4 in blue: and a plot example of the chi-2 distribution and cumulative distribution for the parameter k — 2: 018 - Figure 4.104 - Chi-2 x 2 law (mass and cumulative distribution function) In the literature, it is traditional to write: X = xl or X = X 2 (k) (7.653) 458/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic to indicate that the distribution of the random variable X is a chi-square distribution. Further- more it is common to name the parameter k "degree of freedom" and abbreviate it "df". The y 2 distribution is therefore a special case of the Gamma distribution and by taking k = 2 we also find the exponential distribution (see above) for A = 1/2: x p 2{x) = . i [0i+oo[ = i e 2l [0+oo[ (7.654) Moreover, since (see section Differential And Integral Calculus): r Q) = (7.655) the x 2 distribution with k equal to unity can be written as: p i ( x ) = 2 ^T(l/2) xl/21e ~ X/2 ' 1[0 ’ +oo[ = ^/h^ e ~ X/2 ' 1[ °’ +oo[ (7 ' 656) Finally, let us finish with a fairly large property in the field of statistical tests that we will investigate a little further and particularly for confidence intervals of rare events and the famous Fisher methode for multiple p - value test. Indeed, the reader can check in a spreadsheet software like Microsoft Excel 14.0.6123 that we have: =P0ISS0N .DIST (xeN, /i,TRUE) =1-CHISQ.DIST( 2/r, 2(x + 1),TRUE) =1-GAMMA .DIST ( 2/r , x + 1,TRUE) =1-EXP0M .DIST ( x, 0.5, TRUE) So we need to prove this relation between law y 2 and Poisson distributions. See it starting from the Gamma distribution: /ytCL lp AX P a, X(X) = A°1 r(a) ]0,+oo[ (7.657) If we write A = 1/2 and a = k/2 then we have the y 2 distribution with k degrees of freedom: 1 p k(x) = 2 k / 2 T(k/2) x k/2 l g ^/ 2 l ]0)+oo[ (7.658) Now remember that we have seen in the section Sequences And Series, the following Taylor (Maclaurin) serie from order n — 1 around 0 to A with integral rest: n— 1 \k r e n (t) (A — 1) n— 1 „ A n— 1 \k r dt ^¥ + k = n ^ • J n— 1 \ k = y — u=A - t fro k] ~\—U u (n-1) n— 1 \ k E k = 0 n— 1 \k E k = 0 (n — 1)! n— 1 \k r (n — 1)! „A —u U n — 1 (n — 1)! dw (7.659) (7.660) info @ sciences. ch 459/5785 4. Arithmetic EAME v3. 5-2013 We multiply by e A : e A e- A = e" A And therefore: n— 1 \k £« + —u U n— 1 71—1 \k (n — 1)! dM | ^ i = ^ _ e A + u 71—1 fc=0 (n — 1)! dw (7.661) n— 1 \fc /> 'U 71—1 /c=0 (n — 1)! dw (7.662) Now, let us focus on the term: u 71—1 (n — 1)! dw (7.663) and make a first change of variable: f e~ u UU 1 d u = [ -x n - l e- x/2 -dx = I — T / (n — 1)! u=x/ 2 J 2 n_1 (n — 1)! 2 J 2 n (n — 1)! x n-l e - x /2 d x (7.664) and a second change of variable (caution! the k in the change of variable is not the same as this in the Poisson sum...): 2 n (n - 1)! x n - l e- x/2 dx = n=k/2 o 2 fc /2 ( | - 1 ) ! -x k /2-l e -x/2 d . X (7.665) However, we have shown in the section of Differential And Integral Calculus that if x is a positive integer: x\ = r(ic + 1)! (7.666) Then it comes: Finally we have: (7.667) » 2 ‘ /2 U r where we find out the chi-2 distribution under the integral! So at the end: -x k/2-l e -x/2 dx (7.668) n — 1 \ k E 7T e_A = 1 ^ jfc! k = 0 2*/ 2 ( | I ! -x k/2 ~ l e- x/2 dx (7.669) This explains the formulas given above for the spreadsheet software. 460/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 7.6.20 Student Distribution The "Student distribution" (or "Student’s law") of parameter k is defined by the relation: (7.670) with k being the degree of freedom of the x 2 distribution underlying the construction of the Student function as we will see. Let us indicate that this distribution can also be obtained in Microsoft Excel 1 1.8346 using the TDIST ( ) function and its inverse by TINV () . It is indeed a distribution function because it also satisfies (remains to be proved directly, but as we will see it is the product of two distribution functions thus indirectly...): (7.671) — OO Let us see the easiest proof to justify the provenance of the Student distribution and that will also be very useful further in statistical inference and analysis of variance. (7.672) 1. If A" and Y are two independent random variables with respective densities fx ■ fy, the distribution of the pair (A", Y) has a density / satisfying (axiom of probabilities!) f(x,y) = fx{x)fv(y) (7.673) 2. The distribution J\f( 0, 1) is given by (see above): (7.674) (7.675) for and y > 0 and n > 1. 4. The function T is defined for all a > 0 by (see section Differential and Integral Calculus): +oo (7.676) o info @ sciences. ch 461/5785 4. Arithmetic EAME v3. 5-2013 and satisfies (see section Differential and Integral Calculus): r(a - 1) r(n) a — 1 (7.677) for a > 2. These reminders made, now consider a random variable X that follows the distribution Af(0, 1) and Y a random variable following the distribution x'i- We assume X and Y being independent and we consider the random variable (this is at the origin the historical study of the Student distribution in the framework of statistical inference which led to define this variable for which we will deepen the origin later): „ r- x x •V(°> 1) 1 — v Tl — — , — — - \/yJn \/xi/n (7.678) We will prove that T follows a Student distribution of parameter n. Proof 4.33.4. Let F and / and be respectively the repartition and density functions and T, fx ■ fv the density functions of A", Y and (A", Y) respectively. Then we have for all t e M: F(t) = P(T ^ t) = P <tj = jj f (x, y)dxdy = jj f x (x)f Y (y)dxdy (7.679) D D where: D = | (x,y) elx M*\x < (7.680) the imposed positive and non-zero value and y being due to the fact that it is under a root and furthermore at the denominator. Thus: +oo ty/y/ V™ +oo F(t) = JJ f x (x)f Y (y)dxdy = J f Y (y)dy j f x (x) dx = j f Y (y)d>(t^/y/y/n)dy D 0 — oo 0 (7.681) where because X follows a Normal JV(0, 1) distribution. U (7.682) — OO is the Normal centered reduced cumulative distribution. Thus, we obtain the density distribution function of T by deriving F: +oo f(t) = F'(t) = J f Y {y)fx {t sjyj Vn) Vijdy (7.683) o 462/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic because (the derivative of a function is equal to its derivatives multiplied by its inner derivative): (7.684) dr yjn Therefore: +oo f(t) = -i= fY(y)fx(t^y/Vn)y/ydy 0 +oo I p-yl 2 n, n l 2 ~1 1 r -(U/y/Vn ) 2 / 2 nj 2 n / 2 T(n/2) 6 V o 111 -s/ydy +oo n 2 n / 2 T(n/2) v^tF e - y /2 2/ n/2-l e -(t^/v / H) 2 /2 v ^ d?/ (7.685) +00 2 n / 2 r(n/2)v / 2vrn 7'( 1 + t / n ) n /2-l i e 2 y d y By making the change of variable: 2 / /, m = - i + - 2 V n we get: dy = f 2 1 H n d u d y = m = 2 n / 2 T(n/2)V^m \1 + t 2 /n n + 1 (n+l)/2 +00 e -« M (n-l)/2 du =r n + 1 2 n / 2 r(n/2)A/27r v / n \1 + t 2 /n t (n+l)/2 r(+ 0 . _n+i 7-2 \ 2 1 + r(n/2)-)/7m V n, what is the Student distribution of parameter n. (7.686) (7.687) (7.688) □ Q.E.D. info @ sciences. ch 463/5785 4. Arithmetic EAME v3. 5-2013 Let us now prove what is the mean of the Student distribution: X T = Jn \/Y We have: E(T) = E(^/ViX)E \/Y 1 \ But E ( —j= J exists if and only if n > 2. Effectively for n = 1: +oo e, 7f) = 7^ = 21^72)/ o o V n - 3 e 2 y 2 dy and: +oo +oo 1 +oo f f e -2 f e _ 2 _1 (1 / e 2 y 2 ay = / dy ^ / dy ^ e 2 / -dy — >■ +oo ./ .7 y J y J y 0 0 0 0 Whereas for n > 2 we have: +00 +oo y n — 3 n — 1 / n — 3 n — 1 f Tl — 1 e 2j; 2 d?/ = 2 2 e u 2 du = 2 2 r[ — - — ) < +00 0 0 Thus, for n — 1 the mean does not exist. So for n > 2: E(T) J^SEpOE = 0 =0 Now let us see the value of the variance. So we have: V(T) = E(T 2 ) — E(T) 2 First we will discuss the existence of E(T 2 ). We have trivially: E (T 2 ) = nE (X) = nE(X 2 )E (A X follows a Normal centered reduced distribution thus: V(X) = 1 = E(X 2 ) - E(X) 2 = E(X 2 ) => E(X 2 ) = 1 ^o With regard to E ( — ) we have: +oo +oo +oo E|d) = f = 9 "n? /9 n [ e~ 2 yt~ 2 dy = „ 1 f e~ u u^~ 2 du y ) J y 22 T(n/ 2 ) ./ 2a T(n/2) ,/ 0 0 0 (7.689) (7.690) (7.691) (7.692) (7.693) (7.694) (7.695) (7.696) (7.697) rd- 1 ) 2T(n/2) (7.698) 464/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic where we made the change of variable u = y/2. But the integral defining T ( — — ij converges only if n > 3. Therefore E(T 2 ) exists if and only if n > 3 so it’s value is according to the properties of the Euler Gamma function demonstrated in the chapter of Differential And Integral Calculus: nE (T) r (8 — i) 77 — — 2T(n/2) n n — 2 Therefore for n > 3: V(T) = n — 2 It is also important to note that this law is symmetrical about 0 ! (7.699) (7.700) Plot example of the Student distribution and cumulative distribution for the parameter k = 3: Figure 4.105 - Student T law (mass and cumulative distribution function) 7.6.21 Fisher Distribution The "Fisher distribution" (or "Fisher-Snedecor distribution") of parameters k and 1 is defined by the relation: (7.701) if x > 0. The parameters k and l are positive integers and correspond to the two degrees of freedom of the underlying chi-square distributions. This distribution is often denoted by F^i or by F(k, l ) and can be obtained in Microsoft Excel 11.8346 with FDIST( ) distribution. It is indeed a distribution function because it satisfies the property: +oo F ky i(x)dx = 1 (7.702) info @ sciences. ch 465/5785 4. Arithmetic EAME v3. 5-2013 Let us see the easiest proof to justify the provenance of the Fisher distribution and that we will be us also very useful further in statistical inference and analysis of variance. For this proof, recall that: 1. The distribution xf is given by (see above): = 2 ^ rW "" V/2 ' 1 <7 ' 703) for y > 0 and n > 1. 2. The Euler Gamma function T is defined for all a > 0 by (see section Differential and Integral Calculus): r(a) +oo 0 (7.704) Let X , Y be two independent random variables following respectively the distributions Xl and x 2 m - We consider the random variable: T= Yl = 2^1 (7.705) Y/m Xm/m We will prove that the distribution of T is the Fisher-Snedecor distribution of parameters n, m. Let us note for this purpose F and / the distribution and cumulative distribution function of T and fx, f Y , f density functions of A", Y and respectively (A", Y). We have for all t e M: F(t) = P{T^t) = P < tj = If f(x,y)dxdy = fj f x (x)f Y (y)dxdy (7.706) D D where: f Tit 1 D= \(x,y) eR* xR*\x < — y\ (7.707) t m ' where the imposed positive values comes in fact that behind them there is a chi-square for x and y. Therefore: +00 m F(t) = [ fr(y)dy J f x (x)dy (7.708) o o We obtain the density function of T by deriving F. First the inner derivative: +oo fit) = F\t) = — f f Y (y)f x (—y) ydy (7.709) in J v m ' o 466/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Then explicitly because: 1 f (y) = 2 ™/2 and 2 m / 2 r(rn/2) we then have: +00 /« = = f nt \ 1 (nt \ n / 2 1 — v = g 2m y ( m ' 2 "/ 2 r(n/ 2 ) Vm m ,/ 2 m / 2 r(m/ 2 ) o — w/2 — i 1 « / tii \ 2 e. ' y 2 — — ; — — e 2m l — y j 2/ck/ ° ^ hn, / 2 a T(n/ 2 ) +°o „ 1 — f e -y/ 2 y™/ 2 ~ 1 e -^y (— y] 2 ydy 2 ”/ 2 r(n/ 2 ) 2 m / 2 r(m/ 2 ) m J 6 \m y ) V U o 2 n / 2 r(n/ 2 ) 2 m / 2 r(m/ 2 ) m +oo ” [ e-^v^e-& V />-'ydy 0 2^r(n/2)r(m/2)m Vm By making the change of variable: +oo . n_1 /» Tl ( Tit \ 2 / n+m -I _ 2 / / 1 | wt\ - — / y 2 e n i+ mJdj y / nt \ 2 m «= |( 1 + -) ^2/= v- 2 v m' 1 + nt m d y = dw 1 + nt we get: (7.710) (7.711) (7.712) /(*) = , n i /nt\2 1 n+m -(-OO 2 r m 2^r(n/2)r(m/2) W V 1 + f . u 2 *e “dw n ~p / n+m m 1 V 2 r(n/2)r(m/2) V 777, 7 nt\ 2 4-i n+m i + 2 m ' n \ 2 p / n+m f2- 1 (l+- r(n/2)r(m/2) n+m nt \ 2 r ( 2 +) m- (7.713) 7.6.22 General Folded Normal Distribution The "folded Normal distribution" is the distribution of the absolute value of a random variable with a Normal distribution 1 . As we have mentioned before, the Normal distribution is perhaps the most important in probability and is used to model an incredible variety of random phe- nomena. Since one may only be interested in the magnitude of a Normally distributed variable, the folded Normal arises in a very natural way especially in Finance and Industrial Engineering (Design of Experiments). The name stems from the fact that the probability measure of the Normal distribution on (— 00 , 0] is folded over to [0, 00 ). 'The majority of the text below comes from http : //www.math.uah. edu/stat/ info @ sciences. ch 467/5785 4. Arithmetic EAME v3. 5-2013 Definitions (#96): Suppose that X has a Normal distribution with mean /x e I and standard deviation a E (0, +oo). Then V' = |A"| has the fold normal distribution with parameters // and a. Suppose that Z follows the standard Normal distribution. Let us recall that then Z has proba- bility density function cj) and distribution function $ given by: = wz " 2 z z — oo — oo (7.714) with x6l. If y E K. and a E [0, +oo[, then X — y + oZ has the Normal distribution with mean y and standard deviation a, and therefore it is obvious at this level that: Y =\X\ = \y + aZ\ (7.715) has the folded Normal distribution with parameters y and a. Now let us determine the cumulateded probability CDF function of such a variable! For y E [0, +oo[: F(y) = P(Y < y) = P( \X\ < y) = P(\y + aZ\ < y) = P(-y < y + aZ <y) = P a a Since <&(— Z) = 1 — $(Z) we have: = $ y- y a -y - y a (7.716) F{y) = $ ( y- y ' a - - <f> -y-y o 1 / X + /i = $ 1 ay/2n 2 V a y-y a 1 / x — fl + <!> y + y a - 1 + e 2 V a (7.717) dx We cannot compute the quantile function F 1 in closed form, but values of this function can be approximated. It comes therefore immediately that Y has probability density function / given by: 1 { x + l f x — y 2\~) +e^\~ This follow from differentiating the CDF with respect to y as we know! (7.718) Now as always in this book we will focus only in what we need for the applications in the other chapters! So as we don’t need the moments of the folder Normal distribution we will not calculate them. The only purpose of the above development were to build the tools to be able to introduce a special case of the folder Normal distribution. 468/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 7.6.22.1 Half-normal distribution In probability theory and statistics, the "half-normal distribution" is a special case of the folded Normal distribution. Let X follow an ordinary Normal distribution, A/"(0, a 2 ), then Y = X follows a half-Normal distribution. Thus, the half-normal distribution is a fold at the mean of an ordinary Normal distribution with mean fi = 0. Thus, let Y = \oZ\ = a\Z\ where Z has a standard Normal distribution and a G [0, +oo[. Clearly a is a scale parameter, unlike the case for the general folded Normal distribution. The distribution of Y when a — 1, Y — \Z\ has the "standard half-normal distribution". As the half-Normal distribution is just a special case of the folded Normal distribution with fi = 0 it comes immediately: f(y) with y G M + and sometimes denotes HM{ 0, a 2 ). (7.719) Now what interest us for the others chapters of this book (especially Design of Experiments) are the moments of that latter! To calculate them, first remember that: m = -y==e - ^ 2 (7.720) V 2 7T with zeK. Therefore it is immediate that: 4>'(z) = — z(f>(z ) (7.721) Also remember that we have already proved that: E (Z) = 0 E(Z 2 ) = V(Z) = 1 (7.722) We first need to determiner the moments for the Normal distribution! So for n G N+: +oo +oo +oo E (Z n+1 ) = I z n+i (j)(z)dz = I z n z(t){z)dz = - I z n cf) , (z)dz yU-\~l (7.723) Now we integrate by parts (see section Differential and Integral Calculus), with u = z n and du = 4>'(z)dz to get: E (Z n+1 ) = - z n (f){z ) Therefore for n G N, with n > 1: +oo +oo + / nz n ~ 1 (j)(z)dz = 0 + 7iE(Z n - 1 ) (7.724) E (Z n+1 ) = nE(Z n ~ 1 ) (7.725) The moments of the standard normal distribution are now easy to compute. First we know that: info @ sciences. ch 469/5785 4. Arithmetic EAME v3. 5-2013 • E (Z) = 0 • E (Z 2 ) = 1 Therefore: E(Z) = = 0 E(Z 2 ) = 1 E(Z 3 ) = E (Z 2+1 ) = - 2 ■EiZ 2 - 1 ) = 2 ■ E(Z) = = 0 E(Z 4 ) = E (Z 3+1 ) = = 3 ■E^Z 3 - 1 ) = 3 ■ E(Z 2 ) = 3 • • 1 = = 1-3 = 3 E(Z 5 ) = E (Z 4+1 ) = = 4 ■EiZ 4 - 1 ) = 4 ■ E(Z 3 ) = 4 • ■ 0 = = 0 E(Z 6 ) = E (Z 5+1 ) = = 5 ■E(Z 3 ~ l ) = 5 ■ E(Z“) = 5 • •3 = = 1-3-5 = 15 E(Z 7 ) = E (Z 6+1 ) = = 6 ■E(Z & ~ 1 ) = 6 ' E(Z 5 ) = 6 • ■ 0 = = 0 E(Z S ) = E (Z 7+1 ) = = 7 ■E(Z 7 ~ l ) = 7 ■ E(Z fi ) = 7' • 15 = 1-3-5-7 Therefore we see that for the odd powers, that is to say Z 2n+1 with n G N then: E (Z 2n+1 ) = 0 and for even powers: (2 n)\ E (Z 2n ) = 1 • 3 • . . . • (2n - 1) = n\2 r It follows for X = A/"(0, a) (just check with the special case of n = 0 and n = 1): E( X 2n+i) = 0 E (x 2n ) = a 2 n\ 2 nft 71 )' n\2 r (7.726) (7.727) (7.728) The moments of the half-normal distribution can now be computed explicitly. First it should be quite obvious by construction that the even order moments of Y are the same as the even order moments of oZ (in both case the values are all positive and therefore equal). Hence: E (Y 2n ) n\ 2 n For the odd order moments we must use (see above): /on i a 2cr 2 (7.729) (7.730) 470/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic with x G M + . Therefore by definition: +oo +oo y £(y2n+l) _ f v 2n +lf{v)Ay= f y 2„+l I 2^ d j, CT V 7T (7.731) hence: +oo E (y 2 "+ 1 ) = i^ f y^e'^dy (7.732) Now we make the change of variable u = y 2 / (2a 2 ), therefore first we have (this is obvious): ,2 and: y 2 a 2 = e - 2 t/dy a 2 dw = 3 - <G> d?/ = — dw 2 a 2 y Therefore we have so far: . +00 +00 1 y 2n+1 l e ~ Udu = a ^l I (y 2 T e ~ Udu (7.733) (7.734) (7.735) So finally: . +00 +00 E (Y 2n+1 ) = a^ j (2a 2 u) n e~ u du = a 2n+1 2 n ^ j u n e~ u du (7.736) We recognize in this expression the Gamma Euler function integral (see section Differential and Integral Calculus)! Therefore is is immediate that: E(r 2n+1 ) = a^T 0 (u) = a 2n+1 2 n ^n\ (7.737) So as summary: r2n\ 2 n (^ n ) E(r 2n ) = a 1 n\2 r E(F 2n+1 ) = a 2n+1 2 n \ —n\ (7.738) So finally we get the result we need for some properties of the Brownian motion in finance: E(y) =E(y 2 ' 0+1 ) = a \l ~ V(V) = E(Y" J ) - (E(K )) 2 = E(Y' J I ) - (E(K )) 2 = a‘ - | ‘ ( 1 - j (7.739) But still one property is missing and now for our needs in the section of Industrial Engineering (Design of Experiments): the value of the Median! info @ sciences. ch 471/5785 4. Arithmetic EAME v3. 5-2013 So let M e denote the median of the half-normal distribution. Then by definition if follow: M e M e 1 / 1J 0.50 = F(M e ) = j f{y)dy = ^ f e d y = (7.740) Substituting y/{y/2a) = u we have y/2, o 0.50 = F[M e ) = — / e~ u d u ' 7T ./ (7.741) We recognize here the Error function (see section Thermodynamics). Therefore: 0.50 = F(M e ) = erf M P \/ 2 < cr (7.742) Therefore: M e err 1 (0.50) = y/2, a (7.743) A spreadsheet software like Microsoft Excel give us for the complementary error function: =ERFC (0.5) =0.479500122 Therefore: M e = 0.479500122y / 2cr ^ 0.67a (' 7.744 ) The technique that we will see in the section Industrial Engineering makes the approximation that therefore: 1.5 • M e ^a (7.745) indeed... it’s engineering... 472/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 7.6.23 Benford Distribution This distribution was discovered first in 1881 by Simon Newcomb, an American astronomer, after he saw that the wear (and so the use) of the preferred first pages of logarithms tables (at this time there we compiled into books). Frank Benford, around 1938 remarked at his turn this unequal wear, believing he was the first to formulate this law that unduly bears his name today and arrived at the same results after having listed tens of thousands of data (lengths of rivers , stock quotes, etc.). There is also one possible explanation: we need more often to extract the logarithm of numbers starting with 1 that numbers starting with 9, implying that the first are in "bigger quantity" than the second one. Although this idea may seem to him quite implausible, Benford began to test his hypothesis. Nothing more simple: he study tables of numerical values and calculates the percentage of occurrence of the left-most digit (first decimal). The results obtained confirm his intuition: From these data, Benford found experimentally that the cumulative probability of a number First position number 1 2 3456789 Apparition probability (%) 30.1 17.6 12.5 9.7 7.9 6.7 5.8 5.1 4.6 Table 4.27 - Occurrence of a digit following the Benford distribution beginning with the digit n (except 0) is (we will prove this later) is given by the relation: (7.746) named "Benford distribution" (or "Benford law"). Here is a Maple plot of the previous function: i 0.9 0 . 8 - 0 . 7 - 0 . 6 - 0.5 0 . 4 - 0 . 3 - 0 2 4 6 8 X Figure 4.106 - Plot of the Benford function (cumulative distribution function) info @ sciences. ch 473/5785 4. Arithmetic EAME v3. 5-2013 It should be noted that this distribution applies only to lists of values that are "natural", that is to say numbers with physical meaning. It obviously does not work on a list of numbers randomly drawn. The Benford distribution has been tested on all kinds of tables: length of the rivers of the world, country area, election results, price list of grocery store ... It is true almost every time. The distribution is said to to be independent of the selected unit. If we take for example a supermarket price list , it also works well with the costs expressed in dollars as with the same costs converted into Euros. This strange phenomenon remained unexplained and little studied until quite recently. Then a general proof was given in 1996, which uses the central limit theorem. As surprising as it may seem, this distribution has found application: it is said that the IRS use it to detect false statements. The principle is based on the restriction seen above: Benford’s distributions applies only to values with physical meaning. Thus, if there is a universal probability distribution P(n) on such numbers, they should be invariant under scaling such that: If: Then: P(kn ) = f(k)P(ri) P{n)dn = 1 P(kn)dn = — and the normalization of the distribution gives: m = * If we derivate P(kn) = f(k)P(n ) with respect to k we obtain: -^-P(kn) = ^r\p{ n ) =>• nP'(kn) = P{ n )~Trj => nP\kn) QrC Q /C rC QrC rC choosing k = 1 we have: nP'{n ) = — P{n ) This differential equation has for solution: ' P(n) F (7.747) (7.748) (7.749) (7.750) (7.751) (7.752) P(n ) = - (7.753) n This function is not strictly speaking a distribution function (it diverges) and secondly, the physics and human laws impose limits. 474/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic So we have to compare this distribution with respect to an arbitrary reference. Thus, if the decimal number studied contains power of 10 (10 in total: 0, 1, 2, 3, 4, 5, 6, 7, 9) the probability that the first nonzero digit (decimal) is D is also given by: D+l P{n)dn Pd = D 0P(n)dn (7.754) The limits of the integral are from 1 to 10 because the null value is prohibited. The integral in the denominator gives: i i I 0P(n)dn = I 0 — dn = ln(10) — ln(l) = ln(10) o o The integral in the numerator gives: D+l D+l Finally: Pn = ln|;5 zr) ln ( 1 + ^ ln(10) ln(10) By the properties of logarithms (see section Functional Analysis) we have: (7.755) j P(ri)dn = J — dn = In (D + 1) — In (D) = In (7.756) D D (7.757) Pd = log 10 ^1 + (7.758) However, the Benford’s distribution applies not only to non-scaling data but also to numbers from any sources. Explain this case involves a more rigorous investigation using the central limit theorem. This demonstration was conducted only in 1996 by T. Hill by an approach using the distribution of distributions. To summarize an important part of everything we’ve seen so far, the picture below is very useful because it summarizes the relation between 76 most common univariate distributions 76 (57 continuous and 19 discrete): info @ sciences. ch 475/5785 4. Arithmetic EAME v3. 5-2013 Dlecrcte ualforin(a, b) 8 = 0 b = n Rectangular (n) Dela-blnonual(u, 6, n) b nj R.V V NcyUlve hypcrgcoiocttk(f»i , n-i. nj) • c Polmoo(jj) c Wl ^ Hyporgeometrlefni , nj.nj) > = »j> ' " "rt 1 “ Koty»(n, p, />) y I . '.i- : *v p«n,/»,.n = »»,.«, -oo .'I-.* Btoo^K«.p)^ «■! , BmouUKp) <J ~ Noe central beta(/J, f, i) ; Noncen'.ral t(n,£) ^ F(ni, L J ^ 4 rx.«u: y nooeentral t(n, Xonrentral F(ni,»a,4) > ‘ IDB{4, *, y) ^ XX 7 ~ ‘Vft py — /t Doubljr aoncnntnJ F(ni,na,4,7) Ray lei*] L J v «- s ' Properties C: Convolution L -a C P: Forgetful new L * 8 L: Linear combination \f: Minimum F => R P: Product He tat Iona hi pa: R: Kn.dual >- Special cam S: Scaling TVanafor mat lona V: Variate generation >■ Limiting X: Maximum Rayeeian >.« ■ *f) fl4|Me(A,H]^ (TMen*ular(a, k, C KfltBocom-SnifM^n) J l * v J l V J l vA J Figure 4.107 - Relations between distributions (Source: AMS Lawrence M. Leemis and Jacquelyn T. McQueston) 7.7 Likelihood Estimators What follows is of extreme importance in the field of statistics and is used widely in practice. It is important therefore to pay attention! Besides the fact that we will use this technique in this chapter, we shall find it in the chapter of Numerical Methods for advanced and generalized linear regression and also in the chapter of Industrial Engineering in the context of parametric estimation of reliability. We assume that we have observations xi,x 2 ,x 3 , ...,x n which are realizations of unbiased independent random variables (in the sense that they are randomly selected from a batch) X 3 ,X 2 , X : > , ..., X n of a unknown probability distribution but having the same one. Suppose we proceed by trial and error to estimate the unknown probability distribution P. One way to proceed is to ask if the observations x\, x 2 , x 3 , ..., x n had a high probability to get out or not with this arbitrary probability distribution P. We need for this to calculate the joint probability that the observations x\, x 2 , x 3 , ...,x n had to get out with the probabilities p 1 ,p 2 ,p 3 , ...,p n . This joint probability is equal to (see section 476/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Probabilities): 1] P{Xi = Xi ) (7.759) i=l noting by the letter P the assumed probability distribution associated to Pi,P 2 ,P 3 , ■■■, Pn • You must admit that it would be particularly awkward, at the intuition level of risk, to choose a probability distribution (with its parameters!) that minimizes this quantity... Instead, we will seek the probabilities pi, P 2 , P 3 , • ••, Pn (or the associated parameters of the prob- ability distribution) that maximizes JJf=i -P(Aj = that is to say, that makes the observations xi,x 2 ,x 3 , ...,x n the most likely possible. This leads us to seek the parameter(s) 6 that maximizes the quantity: n L n {6) = \{Pg{X i = x i ) (7.760) 1=1 and where the parameter 6 is often in undergraduate school level problems a first order moment (mean) or second order moment (variance). The quantity L is named "likelihood". It is a function of the parameter(s) 6 and observations 6. The value(s) of the parameter(s) 6 that maximize the likelihood L n {6 ) are called "maximum likelihood estimators" (MLE estimators). In the very special case but useful of the Normal distribution, one of the parameters 6 will be the variance (see a little further concrete example) and can be considered intuitive to the physicist that to maximize the probability, the standard deviation should be as small as possible (so that the maximum numbers of events are in the same interval). Thus, when we calculate an MLE which is the smallest among several possible, then we are talking about a UMV estimator for "Uniform Minimum Variance Unbiased" because their own variance should be as small as possible. This can be demonstrated (but the proof is not very elegant) using the definition of the Fisher Information and the Frechet theorem (or Rao-Cramer) that makes use of the Cauchy- Schwartz inequality (see section Vector Calculus) and the analogy between mean and scalar product ... This demonstration will not be in this book. Let us still do five small examples (very classic, useful and important in the industry) with in order of importance (i.e. not necessarily in order of ease...) the distribution function of Gauss- Laplace (Normal distribution), the Poisson distribution, the binomial distribution (and so the Geometric distribution), the Weibull distribution and finally the Gamma distribution. Remark These five examples are important as used in SPC (statistical process control) in various international companies around the world (see section Industrial Engineering). V J info @ sciences. ch 477/5785 4. Arithmetic EAME v3. 5-2013 7.7.1 Normal Distribution MLE Let be x\, ..., x n an n-sample of identically distributed random variables assumed to follow a Gaussian-Laplace (Normal) distribution of parameters /i and a 2 . We are looking what are the values of the maximum likelihood estimators 9 that maximize the l ik elihood L n (9 ) of the Normal distribution? We have prove earlier above that the density of a Gaussian random variable was given by: x (x - /r) 2 P(x, n,n) = — 2cr 2 (7.762) a The likelihood is then given by: - i -Jfe5>i-A0 s L(n,a) = Y[P(xi,n,fJ>) = i=1 2=1 cr n \/ 27 r (7.763) Maximize a function or maximize its logarithm is equivalent therefore the "log-likelihood" will be: n i n ln(L(/i, a)) = -- ln(2vr) - n ln(cr) - ^ - ^i) 2 (7.764) To determine the two estimators of the Normal distribution, first let us fix the standard deviation. To do this, we derive In (L(/x, cr)) over n and look for what the average value of the function is equal to zero. It remains after simplification the following term that is equal to zero: E(^-a0 2 ( 7 - 765 ) 2=1 Thus, the maximum likelihood estimator of the expected mean of the Normal distribution is after rearrangement: and we see that it is simply the arithmetic mean (or also named "sample mean"). (7.766) Let us now fix the mean. The cancellation In (L(/x, sigma)) of the derivative over a leads us to: (7.767) d , . , , n 1 JP, . 9 — ln(L(/qo-)) = -^2 - =0 cr i=l 478/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic This allows us to write the maximum likelihood estimator for the standard deviation (the vari- ance when the mean is known under the an assumed distribution also supposed known!): (7.768) that some people also name "Pearson standard deviation"... Even if it is a little bit redundant some people as us to show the proof of the estimator of the covariance matrix (and therefore the correlation matrix). Remember that we have prove earlier that for the bivariate case we have: 1 — — E '<*■*’> = 2 In fact the relation is the same for the multivariate case with T! The log-likelihood is therefore immediate by analogy with the univariate case: (7.769) T yi'T' T 1 1 In (LQ7, £)) = — — ln(27r) - - ln(£) - - ^(x* - ^) T S' 1 (x i - /i) (7.770) That we can also write (as £ is diagonal and £ 1 also): vT T I T In (L(/2,£)) = — — ln(27r) - - ln(£) - -^tr ((x* - /j) T £ -1 (xi - //)) nT T 1 T = — — ln(2vr) - - ln(£) - - tr ( E_1 ( x i ~ v) T ( x i ~ ») ) r f 1 T 1 lu(2 ! r)--ln(E)--tr(E- 1 S) Where by definition and using the estimator of the mean: (7.771) Then we deduce that: and we get finally: S = J2( X i - fr) T { X i - A) 2=1 d T ln (l(A,E)) =7 e-1e = 0 9£- (7.772) (7.773) £ = -S T However, we have not yet defined what is a good estimator! What we mean here is: (7.774) If the mean of an estimator is equal to itself, we say that this estimator is "unbiased" and that’s obviously what we want! info @ sciences. ch 479/5785 4. Arithmetic EAME v3. 5-2013 • If the mean of an estimator is not equal to itself, then we say that this estimator is "biased" and is necessarily less good... In the previous example, the average is unbiased (this is trivial as the average of the arithmetic mean is equal to itself). But what about the variance (verbatim the standard deviation)? A simple little calculation by linearity of the mean (since the random variables are identically distributed) will give us the answer in the case where the theoretical average (mean) is approxi- mated as in practice (industry) by the estimator of the mean (most common case). So we have for the calculation of the mean of the "sample variance": E (ff 2 ) = E(Vpf)) = E (1 £>i - m) 2 ) = E (1 f> 2 - 2xiH + ? i = 1 l 71 n i = 1 = E - Z) - 2 ~ H A 2 ) = E - H x2 i - 2 A- + ~ n A n U i 2=1 n T= i n T= i = E ( ~ 1 4 - 2/i 2 + a 2 ) ) = E (- 1 x? - 2£) ) = -± E(x 2 ) - E(p, 2 ) n i=l n i = 1 n 2=1 (7.775) However, as the variables are supposed to be identically distributed: i n I n E (^ 2 ) = -J2 E (^ 2 ) - E (A 2 ) = - E E ( x2 ) - E (A 2 ) = E ( x2 ) - E (A 2 ) n i = 1 n (7.776) i=l And as we have (Huyghens theorem): V(X) = E(X 2 ) - E(X 2 ) V(A) = E (A 2 ) - E (A ) 2 = E(A 2 ) - E PQ 2 (7.777) (7.778) wherein the second relation can be written only because we use the maximum likelihood esti- mator of the average (empirical average). Therefore combining the two above relations with the prior-previous one we get: E(d 2 ) = E(z 2 ) - E (/i 2 ) = (V(X) + E(X) 2 ) - (Y(/t) + E(X) 2 ) = V(X) - V(/2) (7.779) and as: 2 V(X) = cr 2 and V(/t) = — (7.780) n Finally we have: E(<j 2 ) = a 2 - — = ( 1 - -) a 2 = — -a 2 (7.781) n \ n J n so we have a bias of at least one standard error: (T 2 (7.782) n then we say that this estimator has a negative bias (it underestimates the true value!). 480/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic We also note that the estimator tends towards to an unbiased estimator of the variance (USV) when the number of items tends to infinity n — * +oo. We say that we have a "asymptotically unbiased" or "asymptotically unbiased estimator". It is important to note that we have yet proved that the empirical variance tends towards the theoretical variance when n tends to infinity and ... that the data follows or not a Normal distribution! Remark An estimator is named "consistent estimator" if it converges in probability, when n — >■ +oo, towards the true parameter value. V / By the properties of the mean, we get: E(E(d 2 )) = E(d 2 ) = E ( - — -a n (7.783) We have then: a = n -a 2 = 71 — 1 \ n — In-, V i=l v 1 n I 1 n — - I2( x i - A) 2 = \ - — t _ A) 2 n ~ 1 (7.784) simply called the "standard deviation" ... (that must not be confused with the "standard error" as we shall see later). So we finally summarize as following the two important previous results: 1 . The "biased maximum likelihood estimator" or also named "empirical standard deviation" or "sample standard deviation" or "Pearson standard deviation" ... is therefore given by: (7.785) when n — > +oo. We find this standard deviation depending on the context (by tradition) noted in five other ways that are: cr*, S*, a*, S'*, S n (7.786) and sometimes (but this is very awkward because it often generates confusion with the unbiased estimator) a or S. 2. The "unbiased maximum likelihood estimator" or simply named "standard deviation": r a = - A ) 2 i = 1 (7.787) which as we can see is a consistent estimator (when n tends to infinity it tends to the biased maximum likelihood estimator). info @ sciences. ch 481/5785 4. Arithmetic EAME v3. 5-2013 We find this standard deviation depending on the context (by tradition) noted in three other ways that are: (7.788) We find these last two notations often in tables and in many softwares and we will use them later in the development of confidence intervals and hypothesis testing! For example, in the Microsoft Excel 1 1.8346 the unbiased estimator is given by the STDEV ( ) function and the non-biased by STDEVP ( ) . In total, this make us is three estimators for the same indicator! As in the overwhelming majority of cases of the industry the mean is not known, we usually use the last two relations bordered above. Now this is where comes the vicious part: when we calculate the bias of this two estimators, the first is biased, the second is not. So we tend to use only the latter. Nay! Because we could also talk about the variance and precision of an estimator, which are also important criteria for judging the quality of an estimator relative to another. If we were to calculate the variance of the two estimators, then the first, which is biased, is smaller than the second which is unbiased variance! All that to say that the criteria of bias is not (by far) the only one to be study to judge the quality of an estimator. Finally, it is important to remember that the factor —1 in the denominator of the unbiased maximum likelihood estimator stems from the need to correct the mean of the biased estimator initially subtracted by one time the standard error! 7.7.2 Poisson Distribution MLE Using the same method as for the Normal (Gauss-Laplace) distribution, we will seek the maxi- mum likelihood estimators of the Poisson distribution which for recall is given by: (7.789) Thus, the likelihood is given by n (7.790) Maximize a function or maximize its logarithm is equivalent therefore: n n In [L{n)] = ln(/i) ln(x;!) - /in (7.791) i = 1 i = 1 We are now looking to maximize it: (7.792) 482/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic and thus we obtain the only maximum likelihood estimator that will be: (7.793) It is quite normal to find in this example the sample mean because it is the best possible estimator for the parameter of the Poisson distribution (which also represents the mean of a Poisson distribution). Knowing that the standard deviation of this particular distribution (see above during the devel- opment of the Poisson distribution) is the square root of the mean, then we have for the standard deviation maximum likelihood : (7.794) Remark We show in the same way identical results for the exponential distribution that is widely used in preventive maintenance and reliability! 7.7.3 Binomial (and Geometric) Distribution MLE Using the same method as for the Normal distribution (Gauss-Laplace) and the Poisson dis- tribution, we will seek the maximum likelihood estimator of the Binomial which we recall, is given by: w *> = ■ ■ p = = kw^r / (1 - P)N ■* (7 ' 795) Accordingly, the likelihood is given by: N L(p) = \[P{xi,p) = C£p k (l-p) N ~ k 1=1 (7.796) It should be remembered that the factor following the combinatorial term already expressed the successive variables according to what we saw during our study of the Bernoulli and Binomial distribution functions. Hence the disappearance of the product in the preceding equality. Maximize a function or maximize its logarithm is equivalent therefore: ln[L(p)] = ln(C^) + kln(p) + (N — k) ln(l — p) We are now looking to maximize it: 3 In [L(jp)\ k N — k dp p 1 — p (7.797) (7.798) info @ sciences. ch 483/5785 4. Arithmetic EAME v3. 5-2013 The reader may have perhaps noticed that the binomial coefficient has disappeared. There- fore, we immediately deduce that the estimator of the binomial distribution is the same as the geometric distribution. Which gives: k( 1 — p) — p(N — k) = k — kp — pN + pk = k — pN = 0 (7.799) from which we derive the maximum likelihood estimator: (7.800) This result is quite intuitive if we consider the classic example of a coin that has a chance on tow of dropping on one of its faces. The probability p being the number of times k a given face where was observed in the total number of tests (all sides combined). Remark In practice, it is not as easy to apply these estimators ! We must carefully consider which are most suitable for a given experiment and ideally also calculate the mean squared error (standard error) of each of the estimators of the mean (as we have already done for the empirical mean earlier). In short... it is a long process of reflection. 7.7.4 Weibull Distribution MLE We saw in the section of Industrial Engineering a very detailed study of the three-parameter Weibull distribution with its standard deviation and mean because as we mentioned it is quite used in the field of reliability engineering. Unfortunately, the three parameters of this distribution are unknown in practice. Using estima- tors however we can determine the expression of two of the three assuming 7 as zero. This gives us the following Weibull distribution named "Weibull distribution with two parameters": P 0, P, rj) = P n vn and for recall with p > 0 and // > 0. v) = n & v) = n - ( — 2=1 =1 V W x e (7.801) P -I n / — — Xi ^)”ru V/ t= iV7 V 77, ' e 1=1 (7.802) 484/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Maximize a function or maximize its logarithm is equivalent therefore: H L (P,v)) = In — n In — — n In — 1 rjP 5>? +E ln 2= 1 2=1 7 (3j2 x i + (P- !)£ ln 2=1 2=1 (7.803) Now we seek to maximize this by remembering that (see section Differential And Integral Calculus): -j-a x = a x ln(a) and -^-a x = —a x ln(a) dx da: then: dln(P,r]) 1 l ^ p l ^ p ^ Xi —sF~ = n ~p + ^ hl( " ) g ^ I>< Hx,) + S ” 7 1 1 = ^ + ^f 1 1 J2 x i (ln(?/) -h i(xi)) . 2=1 n Y, x i ( ln (-A) - ln(ry)) 11 rp . + £ln- U. 7 . 2=1 V' In — h 7 P And we get for the second parameter: 1 1 ^ p , 3;* y" i Xi n = n— > x. In k + > In — = 0 ^ 7 § 7 <91n(/3, 77 ) dr] = —n A Pi V T]^T] “ + (1 ~/3)n- = 0 then: 1 rjP y>f-n = 0 2=1 (7.804) (7.805) (7.806) (7.807) Finally to resume with the correct notations (and in the resolution order in practice): ,4 m 1 _ n 1 YA„4 1 1 n A 'T- n T- 1 n , ^ of In In = 0 and — n = 0 P i = 1 7 i=1 7 t)4 i=1 (7.808) Solving these equations involves heavy computations and we can a priori do nothing with that in conventional spreadsheets softwares such as Microsoft Excel or Open Office Calc without programming (at least as far as we know...). We then take a different approach by writing our Weibull distribution with two parameters as follows: x 13 P{x,P,e) = ^x p - 1 e 9 (7.809) 6 info @ sciences. ch 485/5785 4. Arithmetic EAME v3. 5-2013 with for recall (3 > 0 and 6 > 0. Therefore the likelihood is given by: J x 0 X 7 V' ILL n no __L / a\ n ~ n n L(P,v) = n P(xi,(3,9) = n ^x? _1 e o =(-) e * =1 i = 1 i=l " \“ J i=\ Maximize a function or maximize its logarithm is equivalent therefore: n xf (7.810) H L (P,v)) = In o tt 4-1 — I e t=i 11 2=1 = nln 0 “ Q Y. x i + £ ln (< ) = win -Q - Q Y, x i + (P “ !) E ln ( 1 ~ 1 2=1 2=1 2=1 2=1 (7.811) Now we seek to maximize this by remembering that (see section Differential And Integral Calculus): then: -^-a x — a x ln(a) and —a x =£—a x ln(a) dx dx dln(P,e) n l ^ p ^ = ln (^) + £ ln (^) = 0 dp p e And we have for the second parameter: 2=1 3 ln(/J,«) n , 1 ^ „ „ — m ~ = -J + pV x ‘ =0 2=1 It is then immediate that: (7.812) (7.813) (7.814) injected into the equation: x ; n i=l r . 1 n n ln (^) + £ H X i) = 0 P V i = 1 i= 1 (7.815) (7.816) We get: n P n J2 x iH x i) 2=1 1 n £*? + ^ln(xj) = 0 2=1 (7.817) 486/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic simplifying: i = 1 ln(xj 1 n p i n E ln O*) 1=1 The resolution of the two equations (in order from top to bottom): (7.818) (7.819) can easily be calculated with the Target Tool of Microsoft Excel or Open Office Calc. 7.7.5 Gamma Distribution MLE Here we will use a technique named "method of moments" to determine the estimators of the parameters of the Gamma distribution. Suppose that X {j ..., X n are independent and identically distributed random variables according to the Gamma distribution with density: 1 p All Pa,x(x) = - } A Q l [0 , +oo[ (7.820) We seek to estimate a, A. For this, we first determine some theoretical moments. The first moment is the expected mean that as we have proved before is given by: E(3f) = m 1 = a - (7.821) A and the second moment, the mean of the square of the random variable, is as we have implicitly proved in the proof of the variance of the Gamma distribution given by: E(X 2 ) = m 2 = a (a + 1) A2 (7.822) We then express the relation between the parameters and the theoretical moments: a mi= A m 2 = a (a + 1) (7.823) info @ sciences. ch 487/5785 4. Arithmetic EAME v3. 5-2013 The resolution of this simple system gives: m; a = A = 777,2 — 777 ^ 7772 — ITT'l o m\ (7.824) Once this system established, the method of moments consist to use the empirical moments, i.e. for our example the first two, mi, m 2 : AA + ... + X n , (7.825) mi = n m 2 = + ... + A, 77 that we define as equal to the true theoretical moments ... Therefore, it comes: m a = 1 A = m 2 — 777 1 7772 — 777 ^ mf (7.826) 7.8 Finite Population Correction Factor Now we prove another result which we will be required in some statistical tests that we will see later. Suppose we have a population of A individuals that we we represent by the set {1,2, ..., N} and a random variable X which is an application of {1, 2, ..., N} in R. We denote by Xi = X (i). The mean of X is thus given by: 1 N ( 7 - 827 ) Remember the variance of X is by definition: 1 N (7.828) 2=1 Now we consider the set E of samples of size n taken in {1, 2, ..., N} with 0 < 77 < N. Each individual has a probability of being drawn equal to: 11 1 _ (N - 77 )! NN-1 ' ' N ~(7i- 1) = IVF (7.829) We are interested in the random variable X defined on E and that equal to the sample mean. More specifically: 1 n X(ii, ..., in) = -J2 X ik n k = 1 (7.830) 488/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic To calculate the variance V(A), we will X express as a sum of random variables. Indeed, if we define the variables A"/,, with k = 1...N by: Xk (a j v) x k if k G {ii, 0 otherwise (7.831) (7.832) We have naturally by the previous definition (see with caution the sum limits!): 1 N x = -J2x k n t: i (7.833) and thus we get: N N v(*) = z = r I E V (W) + E««W.^ n n (7.834) . k= 1 The random variables X*, are not independent in pairs, in fact as we shall see, their covariances are not zero if N is finite. Otherwise (zero covariance), we find a result already proved earlier: 2 V(A) = (Tx = — (7-835) n So we need to calculate the variances V(A) and covariances cov(Aj, Xj). For this purpose we will use the Huyghens relation and we will start by calculating the mean 14 AY): E(X k ) = P( X k = x k )x k (7.836) But P(X k = x k ) is the probability that a sample contains k. This probability is obviously equal ion/ N and therefore: Tl E(A fc ) = P(X k = x k )x k = —x k (7.837) Similarly we obtain: E (Xj*) = P(X k = x k )x k = —x 2 k (7.838) We can therefore calculate the variance Y(X k ): V(X„) = E(X 2 t ) - E(A' fc ) 2 = ^ 4 - x k ) 2 = n{N N ~ n) 4 (7.839) To calculate the covariances we need now to calculate the means E(AjA^): E(X t Xj) = P(Xi = Xi,Xj = Xj)xiXj (7.840) But P(Xi = Xj, Xj = Xj) is the probability that a sample contains i and j. This probability is obviously given by: n n — 1 NN-1 (7.841) info @ sciences. ch 489/5785 4. Arithmetic EAME v3. 5-2013 and therefore: E(X. l X J ) n{n — 1) N{N-lf iXj (7.842) We can now compute the covariance: co y{Xi,Xj) = E (XiXj) - E(X i )E(X i ) n(n — 1) n 2 rp . rp . rp . rp . N(N-l) 1 3 N* ' 3 ~ n(N — n ) ' N 2 (N -l) XiXj (7.843) We are now able to simplify V(2f): V(X) = 2 / N N n E v ™ + E cov^x,, fc=l i£j n n(N — n ) iV^ TV E 4 - k= 1 7i(iV — n) N 2 (yN 1) AT E x <- X,- *A? (7.844) Using Huyghens theorem we get: V(**) = = E (X 2 ) - E(X fc ) 2 = E (X 2 k ) - /r 2 = E(X 2 ) - /r 2 ^ a 2 + /r 2 = E(X 2 ) (7.845) Using the result proved above and previous relation: E(X 2 ) = | x 2 = a 2 + /r 2 (7.846) Therefore: 9 N 9 N 9 JL 9 N 2 , 9 9N x k = —° +-V ** E = — i a + ^ ) (7.847) u u k = i N For the double sum x,xy, we have: *7 U N J 2 x i x i N N Ex*(xi + ••• + ^i-i + 27+1 + ••• + xjv) = E^(^ Z=1 Z=1 ®i) N N N N N^J2xi-J2 x i = NfiNfi - X 2 = N V - E :r i Z=1 Z=1 Z=1 Z=1 ivy - — (a 2 + a 2 ) n v 7 (7.848) 490/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Therefore: V(X) = Thus: n 1 n n(N - n) N 2 N 2 n a 2 + /i 2 )- n(N — n) / Ar2 9 N 2 , 2 2N = i ((N-n) (ff 2 + Jt 2 ) - n \ v AT- 1 n(AT — n) iV-1 ^ + 1 + A^-1 N — n 9 N — n 9 /i 2 N — 71 N - 1 = n i (JV “ + 1 ( f -\t \ 2 ^2 2 = n( (A, “’ l)<r +AT3T + " 1 / 9 N — n 9 2 (N — ny = nV N ~ n)a> + W^N + f-W=T N — n n(N — n ) AT- 1 + (A r — n) -raAT + n 2 + ./V 2 — nA r \ AT- 1 j 1 f N(N-n) 2 2 (iV-rc) s n \ AT-1 ^ +/i AT-1 = ..???.. = ( 7 2 N — n n(N — 1) (7.849) (7.850) The famous factor: fpc= \ - 1 (7.851) that we have already encountered during our study the hypergeometric distribution is named "finite correction factor (on finite population)" and has the effect of reducing the standard error especially as n is large. 7.9 Confidence Intervals Until now we have always determined the likelihood estimators or simple estimators (variance, standard deviation) from theoretical statistics distributions or measured on an entire population of data. Definition (#97): A " confidence interval" is a pair of numbers that defines (a posteriori) the range of possible values with a certain cumulative probability of an (punctual) estimator of a given statistical indicator form a sample of an experience (the range of the statistical indicator being usually calculated using real measured parameters). It is the most common statistical case. We now turn to the task that consists naturally to ask ourselves what must be the sample sizes of our measured data to have some validity (C.I.: confidence interval) for our estimators or even info @ sciences. ch 491/5785 4. Arithmetic EAME v3. 5-2013 to which confidence interval correspond a given standard deviation or quantile in a Normal cen- tered reduced distribution (for large samples), in a chi-square distribution, Student distribution or Fisher distribution (we will see the last two cases of small sample sizes in the section on analysis of variance or ANOVA) when the man or variance are known or unknown respectively on all or part of the given population. It is important to know that these confidence intervals often use the central limit theorem that will be proved late (to avoid any possible frustration) and the developments that we will do now are also useful in the field of (a posteriori) Hypotheses Tests that have a major role in statistics and therefore indirectly in all fields of science!!! Finally, it could be useful to indicate that a large numbers of organizations (private or institu- tional) make false statistics because the assumptions and conditions of use of these confidence intervals (verbatim hypotheses tests) are not rigorously verified or simply omitted or worse, the whole base (measurements) is not collected in the rules of art (reliability of the data collection and reproductibility protocols not validated by scientific peer). The reader must also know that we have put many other confidence interval techniques detailed proofs related for example for regression techniques in the section of Theoretical Computing. Remark The practitioner should be very careful about the calculation of confidence intervals and the use of hypothesis testing in practice. This is why, to avoid trivial usage error or interpretation, it is important to refer to the following international standards eg: ISO 2602: 1980 •, ISO 2854: 1976 (Statistical interpretation of data - Techniques of estimation and tests relating to means and variances), ISO 3301:1975 (Statistical interpretation of data - Comparison of two means in the case of paired observations), ISO 3494: 1976 (Sta- tistical interpretation of data - Effectiveness of tests relating to means and variances ), ISO 5479:1997 (Statistical interpretation of data - Tests for departure from the normal distri- bution ), ISO 10725:2000 + ISO 11648-1:2003 + ISO 11648-2:2001 (Sampling plans and procedures for acceptance for control of bulk materials), ISO 11453:1996 (Statisti- cal interpretation of data - Tests and confidence intervals relating to proportions), ISO 16269-4:2010 (Statistical interpretation of data - Detection and treatment of outliers), ISO 16269-6:2005 (Statistical interpretation of data - Determination of statistical toler- ance intervals), ISO 16269-8:2004 (Statistical Interpretation of data - Determination of prediction intervals), ISO / TR 18532:2009 (Guidelines for the application of statistical quality and industrial standards ). V / 7.9.1 C.I. on the Mean with known Variance Let’s start with the simplest and most common case that is the determination of the number of individuals to have some confidence in the average of the measurements of a random variable assumed to follow a Normal distribution. First let us recall that we showed at the beginning of this chapter that the standard error (standard deviation of the mean) was under the assumptions of independent and identically distributed 492/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic variables (i.i.d.): (7.852) Now, before we go any further, consider X as a random variable following a Normal distribution with mean // and standard deviation cr. We would like that the random variable has for example 95% cumulative probability of being in a given bounded symmetric interval. Which is therefore expressed as follows: P(/i — S^X^fjb + S) — 0.95 (7.853) Remark Therefore with a confidence interval of 95% you will be right a posteriori 19 times out of 20, or any other level of confidence or risk level a (1-confidence level, 5%) that you will be set up in advance. On average, your conclusions will therefore be good, but we can never know whether a particular decision is good! If the risk level is very low but the event still occurs, specialists then speak about a "large deviation" or a "black swan". Management of outliers is addressed in ISO 16269-4:2010 Detection and treatment of outliers that any engineer doing business statistics has to follow. V I 1 By centering and reducing the random variable: P <c X ~ 11 ^ = 0.95 (7.854) V CT cr cry Let us now write Y the reduced centered variable: P ^y\-P (V °- 95 (7.855) Since the Normal centered reduced distribution is symmetric: 1 - 2P ( Y ^ - ) = 0.95 cr. (7.856) Therefore: P ( Y ^ - a 0.025 (7.857) From there reading statistical tables of the standard Normal distribution (or by using a simple spreadsheet software), we have to satisfy the equality that: r - 9* 1.96 5 = 1.96a (7.858) cr Which can easily be obtained with Microsoft Excel 11.8346 by using the function: =-N0RMSINV ( (1-0 . 95) /2) info @ sciences. ch 493/5785 4. Arithmetic EAME v3. 5-2013 As noted in the traditional way in the general case other than the 95% one (Z is the random variable corresponding to the half quantile of the chosen threshold of the standard Normal distribution): b^Zo (7.859) Now, consider that the variable X on which we wish to make statistical inference is the av- erage (and we show later that it follows a Normal distribution centered reduced distribution). Therefore: 5 ^ Za Then we get: Z 2 a 2 5 2 (7.860) (7.861) from which we obviously take (normally...) the upper integer value... The latter notation is usually written in the following way highlighting better the width of the confidence interval of an underlying threshold level: (7.862) Relation named "sample size estimation by Normal distribution". Thus, we now know the number of individuals we must have to ensure to get a given precision interval b (margin of error) around the mean and that for a given percentage measures are in this range and assuming the theoretical standard deviation a is known (or imposed) in advance (typically used in quality engineering or surveys). In other words, we can calculate the number n of individuals to measure to ensure a given con- fidence interval (associated to the quantile Z) of the measured average assuming the theoretical standard deviation known (or imposed) and wishing a precision b in absolute value of the mean. However ... in reality, the variable Z comes from the central limit theorem (see below) that gives for a large sample size (approximately): X- ix v/y/n (7.863) Rearranging we get the: li = X- and as Z can be negative or positive then it is more logic to write this as: fi = X±Z^= a Jn (7.864) (7.865) 494/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Thus: a - a X -Z—= + y/n a Jn That engineers sometimes write: LCL < fi < UCL (7.866) (7.867) where LCL is the lower confidence limit and UCL the upper confidence limit. This is the Six Sigma terminology (see section Industrial Engineering). And we have seen earlier that for a confidence interval of 95% we have Z = 1.96. And since the Normal distribution is symmetric: 95% = 1 - 5% = 1 - a = 1 - 2 x 2.5% = 1 - 2 x 0.025 (7.868) Thus we finally write the "one sample Z test": X — Z a /2 ^ H ^ X + Z a j 2 (7.869) where we define for all tests having the same structure, the "margin error" by: ME = Z a/ 2 °= (7.870) yjn As we have already mentioned, and we will prove a little further, the arithmetic reduced centered mean of a series of independent and identically distributed random variables with finite variance asymptotically follows a standard Normal distribution, this is why the confidence interval above is very general! This is why we sometimes speak of "asymptotic confidence interval of the mean". These intervals obviously have for origin the fact that we work very often in statistics with samples and not the entire available population. The selected sampling thus affects the value of the punctual estimator. We then speak of "sampling fluctuation". In the particular case of an IC (confidence interval) at 95%, the last relation will be written: X — Zo.025 1= ^ H ^ X + Zq q 25 a= (7.871) y/n y/Tl Sometimes we find the prior-previous inequality in the following equivalent notation: X — Z a / 2 (7 X A 4 ^ X + Z a /2(Jx (7.872) or more rarely with the following general notation (for all intervals): X -ME</i<X + ME (7.873) where ME stands for "margin of error". We are thus now able to estimate population sizes needed to obtain a certain level of confidence a in an outcome or to estimate the confidence interval in which is the theoretical mean knowing the experimental (empirical) average and the estimator maximum likelihood of the standard deviation. We can of course therefore also info @ sciences. ch 495/5785 4. Arithmetic EAME v3. 5-2013 determine the a posteriori probability that the mean is outside a given range ... (one as the other being widely used in the industry). Finally, note that from the previous result, we deduce immediately the stability property of the Normal distribution (shown above) the following test that we find in many statistical softwares: j^calc. {X 2 — Xi) — (// 2 — A*i) (7.874) named "bilateral Z test on the difference of two means" or also sometimes called "two sample Z test" a with the corresponding confidence interval: 'o'- er, erf 07 (X-2 — Xi) — Z a / 2 \ 1 — - < H 2 — /A < (X 2 — Xi) + Z a / 2 \ 1 n n n n (7.875) And this is not because two averages are significantly different that their confidence intervals do not overlap!!!! As shows the graph below obtained with Minitab 16 software where the tcst-Z of the difference is significant at 95%: 10.0 9.9 9.8 * 97 3 9.6 9.5 9.4 9.3 Figure 4.108 - Line plot illustration of the overlay of two confidence with 95% confidence interval 95% Cl’s for the Mean P-value < 0.05 <> Group 1 Group 2 while their mean is significantly different to a confidence level of 95%. Remark The size of the parent population for the relations developed above does not come into consideration in the calculations of confidence intervals or even not in the sample size, and because it is considered as infinite. So be careful not sometimes not to have sample sizes that are larger than the actual parent population... V / 496/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic 7.9.2 C.I. on the Variance with known Mean Let’s start by demonstrating a fundamental property of the Chi-square distribution: If a random variable X follows a Normal centered reduced distribution X = A/"(0, 1) then its square follows a chi-square distribution of 1 degree of freedom: X 2 = X 2 (1) (7.876) This result is sometimes named a "Wald statistics" and any statistical test using it directly (we should better speak about a "test family") can be designer under the name "Wald’s test" (for a concrete example see Cochran-Mantel-Haenszel test in the section of Theoretical Computing). Proof 4.33.5. To prove this property, it suffices to calculate the density of the random X 2 variable with A" = A/"(0, 1). However, if X = Af(0, 1) and if we set Y = X 2 , then for all y > 0 we get: P(Y <y) = P(X 2 ^y) = P{ -y/y ^ Y ^ y/y) (7.877) Since the standard Normal distribution is symmetric about 0 for the random variable X, we can write: P(Y < y) = 2P(0 ^XMWv) (7.878) Denoting by $ the cumulative distribution function of the standard normal distribution, we have: P(Y ^y) = 2$(y/y) - 2 x 0.5 = 2®{y/y) - 1 (7.879) $(0) = P(JV(0, 1) < 0) = 1_ [ e ~ k/2 d k = 0.5 (7.880) v2vr J —oc therefore: P(Y <y) = 2$( v ^) - 2 • 0.5 = 2<D( V ^) - 1 (7.881) The cumulative distribution function of the random variable Y = X 2 is thus given by: P(Y <y) = 2$( v ^) - 1 (7.882) if y is greater than or equal to zero, null if y is less than zero. We will note this cumulative distribution f Y (y ) for the further calculations. Since the density distribution function is the derivative of the cumulative distribution function and X follows a Normal centered reduced distribution so we reduced for the random variable X: P(X = x) = &(x) = = —^=e~ x2 P (7.883) Q.5J "sj 27T info @ sciences. ch 497/5785 4. Arithmetic EAME v3. 5-2013 and then it follows for the probability distribution of Y (which is the square of X for reminder!): l y M«) = 4 (2t( ^) _ 1) = 2t'(^)lv5 = Yvw _1 1 c -v / 2 1 c -v / 2 y/vV^rr VZiry (7.884) this last expression corresponds is exactly the relation we obtained during our study of the chi-square distribution imposing a degree of freedom equal to the unit. The theorem is therefore proved, that is if X follows a Normal centered reduced distribution while its square follows a Chi-square distribution of 1 degree of freedom as: (AV(0,d) 2 = * 2 (1) (7.885) □ Q.E.D. This type of relation is used mainly in industrial processes and their control (see section Indus- trial Engineering). Now let us open a parenthesis that is quite important in some linear regression software reports and especially in the curvature test for design of experiment (see section Industrial Engineering). Let us recall that we have: T = -L n — A/'(0, 1) y/xl/n (7.886) And we have just prove above that: v 2 _ Xl 1 (7.887) Therefore: T = -L n. — Tx!/ 1 = l x 2/ 1 7 x 17 ^ V x ^ n (7.888) And as we have also seen that: F = ± n,m n (7.889) it follow that: T n — \[Fi, (7.890) or more commonly in practice: Tl = F hr (7.891) 498/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic We will now use a result proved during our study of the Gamma distribution. We have effec- tively seen that the sum of two random variables following a Gamma distribution also follows a Gamma distribution where the two parameters are added: X + Y = 7 (p + q, A) (7.892) As the Chi-square distribution is a special case of the Gamma distribution, the same result applies. To be more precise, this is equivalent to say: If Xj , .... A"/,. arc random independent and identi- cally distributed (i.i.d.) variables J\f( 0, 1) then by extension of the above proof where we have shown that: (AV(0,d) 2 = X 2 (1) (7.893) and by the property of linearity of the Gamma distribution, then sum of their squares follows a chi-square distribution of degree k such that: x \k) = X i 2 + X 2 2 + ...+X 2 k (7.894) Thus, the distribution of x' 2 of k degrees freedom is the probability distribution of the sum of squares of k Normal centered reduced variables linearly independent of each other. It is in fact the linearity property of the chi-square distribution (implicitly the linearity of the Gamma distribution)! Now see another significant property of the chi-square distribution: If Xi, ..., X n are indepen- dent and identically distributed JV (//, a) (thus the same mean and the same standard and follow- ing a Normal distribution) random variables and if we write the maximum likelihood estimator variance by: 1 n £(*. - a ) 2 1=1 (7.895) then, the ratio of the random variable S 2 on the standard deviation assumed to be known for the entire population ("the true standard deviation" or "theoretical standard deviation"!) multiplied by the number of individuals n population follows a chi-square distribution of degree n such that: r , Q 2 2 / \ 2 uo * X (n) = Xn = — 2 ~ (7.896) This result is named the "Cochran theorem" or "Fisher-Cochran theorem" (in the particular case of Gaussian samples) and thus gives us a distribution for the empirical standard deviations (whose parent law is a Normal distribution!). Using the value of the standard deviation proved during our study of chi-square distribution we have: V V( X 2 (n)) = 2 n (7.897) info @ sciences. ch 499/5785 4. Arithmetic EAME v3. 5-2013 But n and a are imposed and are therefore considered as constants. We have therefore: n 2 2 “ jV(S^) = V(x 2 H) = 2n =► V(S 2 ) = a* = a 4 - (7.898) And therefore we have an expression of the standard deviation of the empirical standard devia- tion if we know the standard deviation of the population: 2 2 0's, = O’ \/ - V n But we have prove during our study of estimators that: a = 71—1 -a n It follows: Cs* — n — 1 n ~ a sl ~ 77—1 a "\l~ = n V n I2(n — 1) n 2 a (7.899) (7.900) (7.901) It follows therefore the sometimes important relation in the practice of the estimator of the standard deviation of ... the standard deviation: (7.902) Recall that the parent population is said to be "infinite" if the sample selection with replacement or if the size N of the parent population is much higher than this of the sample of size n. Remarks Rl. In laboratories the Xi,...,X n can be seen as a class of individuals of the same product identically studied by different research teams with instruments of the same precision (standard deviation of the measure identically equal). R2. £*2 is the "inter-class variance" also called "explained variance". So it gives a measure of the variance occurring in different laboratories. V / What is interesting here is that from the calculation of the chi-square distribution and by know- ing n and the standard deviation a 2 it is possible to estimate the interclass variance (and also interclass standard deviation). To see that this latter property is a generalization of the basic relation: X 2 (n) = Xl + X 2 + ... + X 2 n (7.903) it suffices to see that the random variable nS 2 /a 2 is a sum of n squares of _A/”(0, 1) independent of each other. Indeed, recall that a centered reduced random variable (see our study of the Normal distribution) is given by: Y = (7.904) cr 500/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic Therefore: n 1 cr 2 n E( A '< - i = 1 (7.905) However, since the random variables Xi,...,X n are independent and identically distributed according to a Normal distribution, then the random variables: X\ — /i a X n /T a (7.906) are also independent and identically distributed according to a Normal distribution but a cen- tered reduced one. Since: nA 2 cr = x 2 H (7.907) rearranging we get: cr 2 = nSl X 2 (n) (7.908) So on the population of measurements, the true standard deviation follows the relation given above. It is therefore feasible to make statistical inference on the standard deviation When the theoretical mean is known (...). Since the chi-square distributions is not symmetric, the only way to make this inference is to use numerical calculations and then we denote the confidence interval at the level of 95% (for example ...) as follows: nS * ^ J2. ^ nS * 2 2 ( \ X2.5 %\ U ) *97.5% W Either by writing 95% = 1 — a: nSl 2 nSl Xl/ 2 (n) " ^ Xl- a/2 (n) (7.909) (7.910) the denominator being obviously the quantile of the chi-2 distribution. This relation is rarely used in practice as the theoretical average (mean) is not known. In order to avoid confusion, the latter relation is often denoted as follows: nSl Xl /2 , n < < nSl X\—a/ 2t n (7.911) Let’s see the most common case: info @ sciences. ch 501/5785 4. Arithmetic EAME v3. 5-2013 7.9.3 C.I. on the Variance with empirical Mean Let us now make statistical inference when the theoretical average of population (i.e. the mean) is not known. To do this, consider now the sum of: X 2 (n) = J2 X'i fi i=i \ a / w i = i “ i = i where for recall is the empirical average (arithmetic mean) of the sample: a in i n Z(X< - M)' 2 =~ 2 E (W -X) + {X- v)f (7.912) X = -£* n (7.913) i= 1 Continuing the development we have: x 2 H = -=■ a- E(^ - ^) 2 + 2(X - fj.) E(^ -X)+ n(X - fif i = 1 2=1 (7.914) However, we have proved earlier in this chapter that the sum of the deviations from the mean was zero. So: X 2 {n) = \ <7 Z 1 n EPQ - X) 2 + 2{X - y) • 0 + n(X - . 2=1 n E(*i-*) 2 + n(*-A*) 2 . 2=1 (7.915) _ (X 4 -X) 2 n{X - fi) 2 _ " (Xi - X) 2 fX-fi 2-^1 _2 ^-2 i = 1 CT cr- i=l (7 <y“ \ n and by taking back the unbiased estimator of the Normal distribution (we change notation to respect the traditions and differentiate the empirical average of the theoretical mean): = — zix.-xf Thus: X 2 (n) = E n ~ 1 ti (X,,-X) 2 (X-ixX _ (n-l)S 2 fX -/i (7.916) 2=1 cr z + V/VS) a z + hJn (7.917) or with another common notation: X 2 H = {n - 1)S 2 , / M n — n <7 + a- \/n (7.918) Since the second term (squared) follows a Normal centered reduced distribution too, so if we remove it we get by the proof made above about the chi-square distribution following property: X 2 (n - 1) = n 1 )S 2 <7 (7.919) 502/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic These developments allow us this time to also make inferences about the variance of a A/(0, 1) distribution when the parameters /i and a of the parent population are both unknown. It is this result that gives us, for example, the confidence interval: ' (n-l)S 2 ^ ^ (n-l)S 2 ^ Xl-a/ 2 ( n ~ 1 ) (7.920) when the theoretical average (mean) /j is unknown. And also to avoid any confusion, it is more usual to write: (n - 1)S 2 „ , „ (n - 1 )S 2 2 ^ ® ^ 2 Xa/2,n— 1 Xl-a/2,n-l (7.921) In the same way as above, we can calculate the standard deviation of the standard deviation that has a great importance in the practice of finance: (n <T [n~l)S 2 \ <x 2 J ' ‘ ^ 2 -V(5 2 ) = V(S 2 ) 2a 4 n — 1 : v(xLi 2 (n - 1) (7.922) info @ sciences. ch 503/5785 4. Arithmetic EAME v3. 5-2013 7.9.4 C.I. on the Mean with known unbiased Variance We have proved much higher that the Student distribution came from the following relation: Z T(k) = (7.923) if Z and U are independent random variables and if Z follows a Normal centered reduced distribution A/"(0, 1) and U a chi-square distribution x 2 (&) as: frit) - n+ 1 k + 1 1 + - (7.924) Y[n/2)\fkn V n and remember that its density function is symmetrical! Here is a very important application of the above result: Suppose that X \, ..., X n is a random sample of size n from a distribution a). So we can already write that following developments made above: Z = X-n <7 / \ n (7.925) And for U that follows a 2 ( /c ) distribution, then if we ask that k = n — 1 then according to the results above: (n-l)S' 2 9 U = f = X (n - 1) (7.926) We then get after some trivial simplifications: X - n X-fi W n a Wn X- ii 1 / y/n X — n y/ujk a- /(n-l)5 2 (n - 1) VS* S/^n So since: T(k) = y/U/k (7.927) (7.928) follows a Student distribution with parameter k then we get the "independent one-sample t- test" or simply calld "one-sample f-test": X~n S/y/n T(n — 1) (7.929) which also follows a Student distribution of parameter n — 1 and is widely used in laboratories for calibration testing. 504/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic This gives us also after rearrangement: H = X- ~^=T(n - 1) (7.930) y/n This allows us to make inference about the mean /x of a Normal distribution with the theoretical standard deviation being unknown (meaning that there is not enough experimental values) but where the unbiased estimator of the standard deviation is known. It is this result that gives us the confidence interval: X - —j=T a/2 (n - 1 ) < /x < X + \/ 77 . T Q/2 {n - 1) (7.931) where we see the same factors as for the statistical inference on the average (mean) of a (theo- retical) random with know standard deviation as the Student distribution is asymptotically equal to the Normal distribution for large values of n. Thus, the previous interval and the following interval: X — Z a/2 ^ ^ n ^ X + Z a / 2 —j= (7.932) y/n y/n givers very similar values (to three decimal places) for values of n at around 10, 000 (in practice we consider that for 100 this is the same...). We immediately deduce by the stability property of the chi-square distribution (proved above in that this property arises from the Gamma distribution) the following test that we find in many statistical software: (X 2 - XJ - T a/2 (n - 2),p + ^ < A x 2 - /Xi < (X 2 - X x ) + T a/2 (n -2 V ^ -3 V ni n 2 V ni n 2 (7.933) named "bilateral t (Student) test on the difference of two means" or more simply "two sample t-test". We can of course therefore also determine the probability that the mean is inside or outside a certain range ... (the both case being widely used in industry). The reader can for fun control with Microsoft Excel 11.8346 that for a large number of mea- surements n, the Student distribution tends to the Normal centered reduced distribution by com- paring the values of the two functions below: =T . INV ( 5%/ 2 , n- 1 ) =N0RM . S . INV (5°/ 0 /2) info @ sciences. ch 505/5785 4. Arithmetic EAME v3. 5-2013 Remark The previous result was obtained by William S. Gosset around 1910. Gosset who had studied mathematics and chemistry, worked as a statistician for the Guinness brewery in England. At that time, we knew that if A b ..., X n are independent and identically distributed random variables then: ^-^=^(0,1) (7.934) a/y/n However, in statistical applications we were rather obviously interested in the following quantity: X-fJL S/y/n (7.935) We then merely assume that this amount followed almost a Normal centered reduced distribution, which was not a bad approximation as can show the image below (d / = n — 1): Figure 4.109 - Comparison between the Normal and the Student distribution functions After numerous simulations, Gosset came to the conclusion that this approximation was valid only when n is large enough (so that gave him the indication that there must be somewhere behind the central limit theorem). He decided to determine the origin of the distribution and after completing a course in statistics with Karl Pearson he obtained his famous result that he published under the pseudonym Student. Thus, is why we call Student distribution that law that should have been called the Gosset distribution. Finally, note that the Student’s T -test is also used to identify whether changes (increasing or 506/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic vice versa) in the average of two identical populations are statistically significant. That is to say, if the size of two dependent samples is the same then we can create the following test (we included all different types of writing that can be found in the literature and in many software implementing this test): T(n - 1) (X 2 - Xj) - 5 0 Sd/Vu (7.936) With: ^ = ( 7 -937) The prior-previous relation is very useful for comparing the same sample twice in different measurement situations (sales before or after a discount on an article for example). This prior- previous relation is called "t - test (Student) averages two paired samples (or dependent samples)" or more simply "paired sample t- test". Definition (#98): We speak of "paired samples" if the sample values are taken 2 times on the same individuals (i.e. the values of the pairs are not independent, unlike two samples taken independently). 7.9.5 Binomial exact Test Often when measuring we want to compare two small samples taken randomly (without replace- ment!) from also a small population ... to know if they are statistically significantly different or not as when we were expecting a perfect equality! We are looking for a suitable test for the following cases: • To know if the sample of a population prefers to use a given technical method of work rather than another when we expect that the population does not prefer one of the other • To know if the sample of a population has a predominant characteristic among two pos- sibilities when we expect that the population is well balanced Before going further into details, let us remind that we must be extremely cautious about how to get the two samples. The experience must be unbiased, this is to say for reminder, that the sampling protocol must not favor one of the both characteristics of the population (if you study the balance between man/woman in a population by attracting people for the survey with a gift info @ sciences. ch 507/5785 4. Arithmetic EAME v3. 5-2013 in the form of jewelry or just by calling during the workdays you will have a biased sample ... because you’ll probably naturally have more women than men...). This said, this situation match with a binomial distribution for which we proved earlier in this chapter that the probability of k successes in a population of size N with a probability of success is p (probability of failure q being therefore 1 — p) was given by the relation: P(N, k ) = C^p k q N ~ k = Cf/(1 - p) N ~ k (7.938) In the case before we are interesting we have p = q = 0.5: P(N, k ) = Cf 0.5 fc 0.5 fc = 0 .5 N Cg (7.939) while remembering that the distribution will not be symmetrical and especially if the population size N is small. If we now denote by x the number of successes (considered as the size of the first sample) and y is the number of failures (considered as the size of the second sample), then we have: P(N, k ) = 0.5 N C? = P{x + y,k) — Q.5 x+y C% +y (7.940) This being done, to build the test and by the asymmetry of the distribution, we will calculate the cumulative probability that k is smaller than the x obtained by the experience and sum it to the cumulative probability that k is greater than the y obtained by the experiment (which corresponds to a cumulative probability of respectively left and right tails of the distribution). So this sum corresponds to the probability: x N P = 0.5^ '£($ + 0.5 N C k (7-941) k = 0 k=y and this last relation is called "binomial exact test (two-tailed)". If the probability P obtained by the sum is above a certain cumulative probability fixed in ad- vance, then we say that the difference with a random sample in a perfectly balanced population is not statistically significant (bilaterally ...) and respectively if it is below, the difference will be statistically significant and therefore we reject the assumed equilibrium. Therefore, if: x 1 N f x' N 0.5" £ C" + 0.5" £ C" = 0.5" £ Cf + £ Cf k = 0 k—y' \k = 0 k=y' the difference with a balanced population will be considered not statistically significant. Of- ten we will a to be at the maximum equal to 5% (but rarely below) which corresponds to a confidence interval of 95%. > a (7.942) Unfortunately from a statistical software to the other the required parameters or results will not necessarily be the same (spreadsheets softwares for example do not include a specific function for the binomial test, will often have to build a table or develop yourself a function). For example, some software automatically calculate and impose (which is quite logical in a sense...) ( x-l N \ £ C t " + £ cn=p (7.943) k = 0 k=y -\- 1 J 508/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic ^Example: From a small population having two particular characteristics x and y that interest us and which we expect to have a perfect balance but as x = y we actually got x = 5 and y = 7. We would l ik e do the calculation with Microsoft Excel 11.8346 to know whether this difference is statistically significant or not at a level of 5%? So, to answer this question, we will calculate the cumulative probability: 0.5^ ( £ C? + E cA = 0.5 12 f E C? + E (7.944) \ K k=0 k=y J \k = 0 k= 7 / which gives us: A B i k Binomial CoeflF. 0 1 1 12 2 66 3 220 4 495 5 792 7 792 8 495 9 220 10 66 11 12 12 1 P 0.774 Figure 4.110 - Calculated values of the binomial coefficients in Microsoft Excel 11.8346 thus explicitly: A B 1 1 k Binomial Coefif. 2 0 =COMBIN( 1 2,A2) 3 1 =COMBIN( 1 2, A3 ) 4 2 =COMBIN(12,A4) 5 3 =COMBIN(12,A5) 6 4 =COMBIN( 1 2,A6) 7 5 =COMBIN( 1 2,A7) 8 7 =COMBIN( 1 2, A8) 9 8 =COMBIN( 1 2, A9) m 9 =COMBIN( 1 2, A1 0) li 10 =COMBIN(12,Al 1) 12 11 =COMBIN(12,A12) 13 12 =COMBIN( 1 2, A1 3 ) P =0.5 A 12 H ‘SUM(B2:B13) Figure 4.111 - Formulas for calculating binomial coefficients in Microsoft Excel 11.8346 thus the cumulative probability being 0.774 (i.e. 77.4%) the difference compared with balanced population will be considered as not statistically significant. Remark This test is also used by most statistical software (such as Minitab) to give a confidence interval of the conformity of opinions in relation to that of an expert. This is what we call an R& R study (reproductability& repeatability) by attributes (see my book on Minitab for an example). v y info @ sciences. ch 509/5785 4. Arithmetic EAME v3. 5-2013 7.9.6 C.I. for a Proportion For information some statisticians use the fact that the Normal distribution arises from the Pois- son distribution which itself derives from the binomial distribution (we have proved it when n tends to infinity and p and q are of the same order) to build a confidence interval in the context of the analysis of proportions (widely used in the analysis of the quality in the industry). To see this, we note X % the random variable defined by: (7.945) (7.946) 1 if the i element of the sample has the attribute A 0 otherwise where the attribute A can be the property "defective" or "non-defective" for example, in an analysis of pieces. We note by K the number of successes of the attribute A. The random variable X — X\ + X 2 + ... + X n we have proved it earlier in this chapter, follows a binomial distribution with parameters n and p with the moments: /i = E(X) = np a = sJX(X) = npq = y/np(l - p) (7.947) That said, we do not know the true value of p. We will use the estimator of the binomial distribution proved above: „ k X P = ~ = ~ n n Based on the properties of the mean we have then: E (p) = Pp = E f—) = — = p \ n } n (7.948) (7.949) And by using the properties of the variance, we have following relation for the variance of the sample mean of the proportion: V (P) = = v ( — v \n V(X) np(l — p) p(l — p) n * ir n (7.950) This then brings us to: and (7.951) Finally, remember that we have proved that the normal distribution resulted from the binomial distribution under certain conditions (practitioners admit that it is applicable as n > 50 and np > 5). In other words, the random variable X following a binomial distribution follows a Normal distribution under certain conditions. Obviously, if X follows a Normal distribution 510/5785 info @ sciences. ch EAME v3. 5-2013 4. Arithmetic then X/n also (and so do p...). Therefore we can center and reduce p so that it behaves as the reduced Normal centered random variable denoted by Z\ Z p — p P( 1 - P ) n (7.952) ^Examples: El. If 5% of the annual production of a business fails, what is the probability that by taking a sample of 75 pieces of the production line only 2% or less will be defective? We therefore have: 0.02-0.05 0.05 • 0.95 75 (7.953) The corresponding cumulative probability to that value can be easily obtained with Mi- crosoft Excel 11.8346: =N0RMSDIST(-1 . 19) =11 . 66% But note that we do not have np > 5 that is satisfied therefore we could exclude to use this result. E2. In its report from 1998, JP Morgan explained that during the year 1998 its losses went beyond the Value at Risk (see section Economy) 20 days on 252 working days of the year based on a 95% temporal VaR (thus 5% of working days considered as loss). At the threshold of 95% it is just bad luck or is that the VaR model used was bad? p — p np — np P(1 ~P) y/PiX ~ P)n n -0.05 • 252 - 20 0.05(1 - 0.05)252 ^ -2.14 < -1.96 (7.954) So it was just bad luck. We can now approximate the confidence interval for the prop